0% found this document useful (0 votes)

17 views72 pages

SLMContents T3531 RProgramming 1734522316054

The document outlines a course on R programming for MBA students, focusing on data analysis skills. It includes learning objectives, recommended books, and a detailed table of contents covering installation, data structures, user-defined functions, and graphical analysis. The course aims to equip students with essential R programming knowledge for data analytics and statistical programming.

Uploaded by

Suyi Suyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views72 pages

SLMContents T3531 RProgramming 1734522316054

Uploaded by

Suyi Suyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

SYMBIOSIS INTERNATIONAL

(DEEMED UNIVERSITY)
Established under Section 3 of the UGC Act. 1956
Awarded Category - I by UGC

Symbiosis School for Online and

Digital Learning
Gram: Lavale, Tal: Mulshi, Dist: Pune, Maharashtra,
India Pin: 412115

E-CONTENT
R PROGRAMMING
MBA SEM – II
DR. BAJEET KAUR
Learning Objective/Outcome (s):
This course will help students to understand the R programming basics. After
completion of this course student can able to do data analysis using R
programming.

Pre-learning:
Basic knowledge of statistics

Books Recommended
 R for Data Science: 1st Editionby Hadley Wickham , Garrett
Grolemund , ISBN-13: 978- 1491910399 ISBN-10: 1491910399
 Hands-On Programming with R, Garrett Grolemund ISBN-13: 978-
1449359010,ISBN- 10: 1449359019
 R Cookbook: Paul Teetor ISBN-13: 978-0596809157,ISBN-10: 0596809158
 Curran, J.M. (2010) Introduction to Data Analysis with R for Forensic
Scientists, ISBN: 978- 1420088267
 Murrell, P (2005) R Graphics, ISBN: 978-1584884866
 Murrell, P Introduction to Data Technologies
www.stat.auckland.ac.nz/~paul/ItDT
Table of Contents
1.1 Installing R.................................................................................................................................. 5
Installing RStudio ............................................................................................................... 5
Steps to Install R Studio (Windows):.................................................................................. 5
1.2 Working with R ........................................................................................................................ 13
1.3 Working with R ........................................................................................................................ 13
1.4 Help feature .............................................................................................................................. 15
1.5 R workspace.............................................................................................................................. 16
1.6 Input and output ........................................................................................................................ 16
INPUT :............................................................................................................................. 16
OUTPUT........................................................................................................................... 16
1.7 Packages ................................................................................................................................... 18
What are packages? .......................................................................................................... 18
Installing a package .......................................................................................................... 19
Loading a package ............................................................................................................ 19
1.8 Working with large datasets ..................................................................................................... 19
2 Module 2 ........................................................................................................................................... 21
2. Dataset and Data ........................................................................................................................... 21
Datasets ............................................................................................................................. 21
2.2 Data structures .......................................................................................................................... 21
Vectors .............................................................................................................................. 21
Matrices ............................................................................................................................ 22
Arrays................................................................................................................................ 23
Data frames ....................................................................................................................... 24
Factors............................................................................................................................... 25
Lists................................................................................................................................... 26
2.3 Data input.................................................................................................................................. 27
Importing data from a delimited text file .......................................................................... 27
Importing data from Excel ................................................................................................ 28
Importing data from CSV ................................................................................................. 28
Importing data from JSON................................................................................................ 29
Importing data from XML ................................................................................................ 29
2.4 Useful functions for working with data objects ........................................................................ 30
2.5 R – Programming Constructs:................................................................................................... 32
Decision making ............................................................................................................... 32
3 User defined functions in R. ............................................................................................................. 37
3.1 FunctionComponents ...................................................................................................................... 37
3.2 Built-inFunction ........................................................................................................................ 37
3.3 User-definedFunction ...................................................................................................................... 38
3.4 CallingaFunction ........................................................................................................................... 38
4 Graphical Analysis using R .............................................................................................................. 40
4.1 Introduction: ............................................................................................................................. 40
4.2 Bar plots .................................................................................................................................... 40
Simple bar plots: ............................................................................................................... 41
Stacked and grouped bar plots: ......................................................................................... 41
Mean bar plots: ................................................................................................................. 42
Tweaking bar plots: .......................................................................................................... 43
Spinograms: ...................................................................................................................... 44
4.3 Box plots: .................................................................................................................................. 44
Using parallel box plots to compare groups: .................................................................... 45
4.4 Dot plots: .................................................................................................................................. 48
4.5 Pie charts:.................................................................................................................................. 50
5 Advanced R ...................................................................................................................................... 53
5.1 Correlations:............................................................................................................................ 53
Types of correlations: ....................................................................................................... 53
PARTIAL CORRELATIONS: ......................................................................................... 54
OTHER TYPES OF CORRELATIONS .......................................................................... 55
5.2 Testing correlations for significance:........................................................................................ 55
5.3 Regression: ............................................................................................................................... 56
The many faces of regression: .......................................................................................... 56
Scenarios for using OLS regression:................................................................................. 57
Simple linear regression: .................................................................................................. 58
5.4 Polynomial regression: ............................................................................................................. 60
5.5 Fitting ANOVA models: ........................................................................................................... 61
Two-way factorial ANOVA: ............................................................................................ 67
5.6 ANOVA as regression: ............................................................................................................. 70
1 Module 1: Introduction to R programming
Dear Learners, the objective of this course is to teach beginners to understand the R programming
basics and enable them to do data analysis using R programming. A variety of topics will be covered
that are important for Data Analytics in order to prepare the students for real life prediction of
data engineering. This course will impart knowledge of the concepts related Data Types in R,
Programming constructs in R, Reading different file formats, User defined functions in R,
Graphs and Charts in R, Statistical data analysis and Web Scraping using R. It also gives the
idea how data is managed in various environments with emphasis on Predictions measures as
implemented in data sets. Statistical Programming in R is a multi-part course designed to get
you up to speed with the most important and powerful methodologies in statistics. This course
is designed to prepare you to do data analysis in R, from simple computations to machine
learning. This course has been written from scratch. This course assumes you are comfortable
with basic math, algebra, and logical operations.

Topic to be covered:
Getting R, Managing R, Arithmetic and Matrix Operations, Introduction to Functions,
Control Structures. Working with Objects and Data: Introduction to Objects,
Manipulating Objects, Con//structing Data Objects, types of Data items, Structure of Data
items, Reading and Getting Data, Manipulating Data, Storing Data.
1.1 Installing R
To install R to work on your own computer, you can download it freely from the
Comprehensive R Archive Network (CRAN). Note that CRAN makes several versions
of R available: versions for multiple operating systems and releases older than the current
one. You want to read the CRAN instructions to assure you download the correct version.
If you need further help, you can try the following resources:
 Installing R on Windows
 Installing R on Mac
 Installing R on Ubuntu
Installing RStudio
RStudio is an integrated development environment (IDE). We highly recommend
installing and using RStudio to edit and test your code. You can install RStudio through
the RStudio website. Their cheatsheet is a great resource. You must install R before
installing RStudio.

Steps to Install R Studio (Windows):

Step 1: Open the CRAN website in your browser. Open your browser and go to URL (
https://cran.r-project.org/ )
 Click on Download R for Windows
Step 2: Click on Base

Step 3: Click on Download R 4.1.1 for Windows (86 megabytes, 32/64 bit)
Step 4: Save your file

Step 5: Open and Run the .exe file just downloaded

 Let the default setting as it is.

Step 6: Choose Language (For now let it be “English”)

Step 7: Click Next

Step 8: Click Next

Step 9: Change the location of R if needed otherwise let it be default

Step 10: Check the required components or leave it default

Step 11: Click on Next

Step 12: Again click on next

Step 13: Select your choices and click next

Step 14: R will start installing. Wait for a moment

Step 16: Your R is successfully installed if you see this screen.

Step 17: Go to the folder where R is installed and go inside bin folder.
Like here path ( C:\Program Files\R\R-4.1.1\bin )

Step 18: Click on R.exe. and run your commands. Finally Done
1.2 Working with R

 R is a case-sensitive and interpreted language.

 You can either enter commands one at a time at the command prompt (>) or run a
set of commands from a source file.
 R provides a wide variety of data types, including vectors, matrices, data frames
(similar to datasets), and lists (collections of objects).
 Most of the functionalities are provided through built-in and user-created functions.
Basic functions are available in R by default. Other functions are contained in
packages that can be attached to a current session as needed.
 Statements contains functions and assignments.
 R make use of the symbol <- for assignments, rather than the typical = sign.

o For example, the statement x <- rnorm(5) creates a vector object named
x which contain five random deviates from a standard normal distribution.

Note: R allows you to use the = sign to be used for object assignments.
However, you would rarely find any program written that way, as it
is not a standard syntax.
 Comments are preceded by the # symbol. All texts appearing after the # is ignored
by the R interpreter.

1.3 Working with R

 In case you are using Windows, launch R from the Start Menu.
 For Mac, double-click the R icon in the Applications folder.
 While using Linux, type R at the command prompt of a terminal window
Let us understand R interface with the help of example.

Suppose you are studying the physical growth of infants in first year of their life. You have
data of the age and weight of ten infants in their first year of life in below table. You would
be interested in weight distribution and their relationship to age.

Weight (kg)
Age (months)
01 4.4
03 5.3
05 7.2
02 5.2
11 8.5
09 7.3
03 6.0
09 10.4
12 10.2
03 6.1

Table: Age and Weight of 10 infants

 Enter the age and weight data as vectors, using the function c (), which combines
its arguments into a vector or list.
 You can apply following built-in functions on the data
o mean and standard deviation of the weights
o correlation between age and weight
o plot the relationship between age and weight so that you can inspect any trend
visually.

A sample R Session

 the mean weight for 10 infants is 7.06 kilograms

 that the standard deviation is 2.08 kilograms
 there is a strong linear relationship between age in months and weight in kilograms
(correlation = 0.91).
 The relationship is also visible in the scatter plot in below figure. It’s
not surprising, that as infants get older, they tend to weigh more.
Scatter Plot for infant’s weight by age

1.4 Help feature

R provides an extensive help facility, and learning to navigate them will help you
significantly in your programming efforts.

R Help interface

 You can use help window in R Studio Environment to access R documentation. The
function help.start() open the browser window with access to introductory and
advanced manuals, FAQs, and reference materials. R provides extensive
help facilities, and learning to navigate them will definitely help in your
programming efforts.
1.5 R workspace
 The workspace is your current R working environment and it includes any user-
defined objects (vectors, matrices, functions, data frames, or lists).
 At the end of R session, you can save the current workspace that’s automatically
reloaded the next time R starts.
 You can use the up and down arrow keys for scrolling through your command
history. By doing so it allows you to select a previous command, edit it if desired,
and resubmit it using the Enter key.
 The current working directory is the directory R will read files from and save results
to by default.
o getwd() function : To find the current working directory in use.
o setwd() function: set the current working directory by using this function. If
you need to input a file that isn’t in the current working directory, use the
full pathname in the call. Always enclose the names of files and directories
from the operating system in quote marks. Some standard commands for
managing your workspace are listed in below table.

Functions for Managing R Workspace

1.6 Input and output

By default, launching R starts with an interactive session with input from the keyboard and output
to the screen. You can also process commands from a script file (a file containing R statements)
and direct output to a variety of destinations.

INPUT :

The source("filename") function submits a script to the current session. In case the filename
doesn’t include a path, the file is assumed to be in the current working directory. For example,
source("myscript.R") runs a set of R statements contained in file myscript.R. By convention,
script file names end with an. R extension, but this isn’t required.

OUTPUT

The sink("filename") function redirects output to the file filename. If the file already exists by
default, its contents are overwritten. Include the option append=TRUE to append text to the file
rather than overwriting it. Including the option split=TRUE will send output to both the screen
and the output file. Issuing the command sink () without options will return output to the screen
alone. GRAPHIC OUTPUT Although sink () redirects text output, as it has no effect on graphic
output. To redirect graphic output, use one of the functions listed in Table below. Use dev.off()
to return output to the terminal.

Functions for saving graphic output

Let’s put it together with an example. Assume that you have three script files containing R code
(script1.R, script2.R, and script3.R). Issuing the statement source ("script1.R") will submit the R
code from script1.R to the current session and the results will appear on the screen. If you then
issue the statements

sink("myoutput", append=TRUE, split=TRUE)

pdf("mygraphs.pdf")
source("script2.R")
The R code from file script2.

R will be submitted, and the results will appear on the screen. In addition to that, the text output
will be appended to the file myoutput , and the graphic output will be saved to the
file mygraphs.pdf .
Finally, if you issue the statements

sink() dev.off()
source("script3.R")
The R code from script3.

R will be submitted, and the results will appear on the screen. This time, no text or graphic
output is saved to files. The sequence is outlined in below figure. R gives quite a bit of
flexibility and control over where input comes from and where it goes.
Input with source() function and output with sink() function.

1.7 Packages

R comes with an extensive capability right out of the box. But some of its most exciting features
are available as optional modules that you can download and install. There are more than 2,500
user-contributed modules called packages that you can download from http://cran.r-
project.org/web/packages. They provide a large range of new capabilities, from the analysis of
geostatistical data to protein mass spectra processing to the analysis of psychological tests! We
will use many of these optional packages in later chapters.

What are packages?

Packages are a collection of R functions, data, and compiled code in a well-defined format. The
directory where packages are stored on your computer is known as the library. The function
.libPaths() shows you where your library is located, and the function library () shows you what
packages you’ve saved in your library.

R comes with a standard set of packages (including base, datasets, utils, grDevices, graphics,
stats, and methods). They give access to a wide range of functions and datasets that are available
by default. Other packages are also available for download and installation. After installing,
they have to be loaded into the session in order to be used. The command search () tells you which
packages are loaded and ready to use.
Installing a package

There are many R functions that allow you to use packages.

 To install the package for the first time, use the install.packages () command. For
example, installing Packages () without options brings up a list of CRAN screen sites.
Once you have selected a site, you will be presented with a list of all available packages.
Selecting one will download and install it. If you know which package you want to install,
you can do it directly by providing it as a work dispute. For example, the gclus package
contains works to build advanced distribution sites.
 You can download and install the package with the
installation command.packages ("gclus"). You only need to install the package once. But
like any software, packages are often updated by their authors.
 Use the update.packages () command to update any packages you have installed.
 To view details in your packages, you can use the installed.packages () command. It lists
the packages you have, as well as their version numbers, dependencies, and other
information.

Loading a package
Install the package download from the CRAN screen and save it to your library. To use it in R,
you need to load the package using the library command (). For example, using a
combined gclus to extract a library (gclus). Of course, you have to install the package before you
can load it. You will only need to upload the package once within the given session. If you wish,
you can customize your startup environment to automatically download the packages you use
most often.

1.8 Working with large datasets

System planners often ask if R can handle big data problems. Usually, they work with large
amounts of data collected from web, weather, or genetic research. Because R holds objects in
memory, you are often limited in the amount of RAM available. For example, on my five-year-
old Windows PC with 2 GB of RAM, I was able to easily manage 10 million data sets (100
variables per 100,000 views). On an iMac with 4 GB of RAM, I was able to handle 100 million
items without any hassle. But there are two problems to consider: the size of the database and the
mathematical methods to be used. R can handle data analysis problems from gigabyte to terabyte
range, but special procedures are required.
2 Module 2
Topics to be covered:

Data Types in R
Different vector operations
Programming constructs in R
Arrays
Lists

2. Dataset and Data

The first step of any data analysis is the creation of a dataset containing the information to be
studied, in a format that meets your needs. In R, this task involves the following:

■ selecting a data structure to hold your data

■ Entering or importing your data into the data structure

These data sources may include text files, spreadsheets, statistical packages, and data
management systems. For example, the data I work with usually comes from SQL information.
It is possible to use only one or two of the methods described in this section, so feel free to choose
the ones that suit your situation. Once the database is created, you will define it, adding flexible
descriptive labels and dynamic codes. Let's start with the basics.

Datasets

A dataset is usually a rectangular array of data with rows representing observations and columns
representing variables..

2.2 Data structures

R provides a wide variety of objects for holding data, including scalars, vectors, matrices, arrays,
data frames, and lists. They usually differ in terms of the type of data they can hold, how they’re
created, their structural complexity, and the notation used to identify and access individual
elements. Data types are used to store information. In R, we do not need to declare a variable
as some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable.

Vectors
• A vector is an ordered collection of basic data types of a given length.
• The only key thing here is all the elements of a vector must be of the identical data type e.g
homogenous data structures.
• Vectors are one-dimensional data structures. There are five atomic classes of Vectors.
Vectors are basically one-dimensional arrays which can hold numeric data, character data, or
logical data. The combine function c() is used to form the vector. Here are examples of each type
of vector:
a <- c(1, 2, 5, 3, 6, -2, 4)
b <- c("one", "two", "three")
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)

Here, a is numeric vector, b is a character vector, and c is a logical vector. Note that the data in a
vector must only be one type or mode (numeric, character, or logical). You can’t mix modes in
the same vector.

NOTE Scalars are one-element vectors. Examples include f <- 3, g <- "US" and h <- TRUE.
They’re used to hold constants.
You can refer to elements of a vector using a numeric vector of positions within brackets. For
example, a[c(2, 4)] refers to the 2nd and 4th element of vector a. Here are additional examples:

> a <- c(1, 2, 5, 3, 6, -2, 4)

> a[3] [1] 5
> a[c(1, 3, 5)] [1] 1 5 6
> a[2:6] [1] 2 5 3 6 –2
The colon operator used in the last statement is used to generate a sequence of numbers. For
example, a <- c(2:6) is equivalent to a <- c(2, 3, 4, 5, 6).

Matrices
• Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular
layout.
• A Matrix is created using the matrix() function.
• Example: matrix(data, nrow, ncol, byrow, dimnames) where,
• data is the input vector which becomes the data elements of the matrix.
• nrow is the number of rows to be created.
• ncol is the number of columns to be created.
• byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
• dimname is the names assigned to the rows and columns.
Matrix are two-dimensional array where each element has the same mode (numeric, character, or
logical). Matrices are created using the matrix function. The general format is
myymatrix <- matrix(vector, nrow=number_of_rows, ncol=number_of_columns, byrow=logica
l_value, dimnames=list( char_vector_rownames, char_vector_colnames))

Where vector contains the elements for the matrix, nrow and ncol specify the row and column
dimensions, and dimnames contains optional row and column labels stored in character vectors.
The option byrow indicates whether the matrix should be filled in by row (byrow=TRUE) or by
column (byrow=FALSE). The default is by column. The following listing demonstrates the
matrix function.

Firstly, you create a 5x4 matrix q. Then you should create a 2x2 matrix with labels and fill the
matrix by rows w. Finally, you should create a 2x2 matrix and fill the matrix by columns e. You
can easily identify rows, columns, or elements of a matrix by using subscripts and brackets. X[i,]
refers to the ith row of matrix X, X[,j] refers to jth column, and X[i, j] refers to the ijth element,
respectively. The subscripts i and j can be numeric vectors in order to select multiple rows or
columns, as shown in the following listing.

Firstly, a 2 x 5 matrix is created containing numbers 1 to 10. By default, the matrix is filled by
column. Then the elements in the 2nd row are selected, followed by the elements in the 2nd
column. Next, the element in the 1st row and 4th column is selected. Finally, the elements in the
1st row and the 4th and 5th columns are selected. Matrices are two-dimensional and, like vectors,
can contain only one data type. When there are more than two dimensions, you’ll use arrays
(section 2.2.3). When there are multiple modes of data, you’ll use data frames (section 2.2.4)

Arrays
Arrays are very similar to matrices but in array you can have more than two dimensions. They’re
created with an array function of the following form:
myarray <- array(vector, dimensions, dimnames)

• Arrays are similar to matrices but can have more than two dimensions. They’re created
with an array function of the following form:
• myarray <- array(vector, dimensions, dimnames)
• dim1 <- c("A1", "A2")
• > dim2 <- c("B1", "B2", "B3")
• > dim3 <- c("C1", "C2", "C3", "C4")
• > z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
Here, vector contains the data for the array, dimensions are a numeric vector giving the maximal
index for each dimension, and dimnames is an optional list of dimension labels. The following
listing gives an example of creating a three-dimensional (2x3x4) array of numbers.

Data frames

• A data frame is more general than a matrix in that different columns can contain different
modes of data (numeric, character, etc.).
• It’s similar to the datasets you’d typically see in SAS, SPSS, and Stata. Data frames are
the most common data structure you’ll deal with in R

• > patientID <- c(1, 2, 3, 4)

• > age <- c(25, 34, 28, 52)
• > diabetes <- c("Type1", "Type2", "Type1", "Type1")
• > status <- c("Poor", "Improved", "Excellent", "Poor")
• > patientdata <- data.frame(patientID, age, diabetes, status)
• > patientdata
• > patientdata[1:2]
Data frames are more general than a matrix in that different columns can contain different modes
of data (numeric, character, etc.). It is very similar to the datasets you’d typically see in SAS,
SPSS, and Stata. Data frames are the most common data structure you’ll deal with in R. The
patient dataset in table 2.1 consists of numeric and character data. Because there are multiple
modes of data, you can’t contain this data in a matrix. In this case, a data frame would be the
structure of choice. A data frame is created with the data.frame() function
mydata <- data.frame(col1, col2, col3,…)
Here, col1, col2, col3, … are column vectors of any type (such as character, numeric, or logical).
Names for each column can be provided with the names function. The following listing makes
this clear.

Each column should have only one mode, but you can put columns of various modes together to
form a data framework. Because data frames are closer to what analysts think of as data sets, we
will use columns and variables that are different when discussing data frames. There are several
ways to identify data frame objects. You can use the subscription text you used before (for
example, with matric) or you can specify column names. Using the previously
created patientdata data frame, the following list shows these methods.

Factors
• variables can be described as nominal, ordinal, or continuous.
• Nominal variables are categorical, without an implied order. Diabetes (Type1, Type2) is
an example of a nominal variable. Even if Type1 is coded as a 1 and Type2 is coded as a
2 in the data, no order is implied.
• Ordinal variables imply order but not amount. Status (poor, improved, excellent) is a good
example of an ordinal variable. You know that a patient with a poor status isn’t doing as
well as a patient with an improved status, but not by how much.
• Continuous variables can take on any value within some range, and both order and amount
are implied. Age in years is a continuous variable and can take on values such as 14.5 or
22.8 and any value in between. You know that someone who is 15 is one year older than
someone who is 14.
• Categorical (nominal) and ordered categorical (ordinal) variables in R are called
factors.
• Factors are crucial in R because they determine how data will be analysed and presented
visually.
• diabetes <- c("Type1", "Type2", "Type1", "Type1")
• diabetes <- factor(diabetes)

As you can see, the variables can be defined as suggested, edited, or continuous. Appointment
flexibility is a category, with no specific order. Diabetes (Type1, Type2) is an example of
selection variability. Even if Type1 is listed as 1 and Type2 is listed as 2 in the data, no order is
specified. Normal variation means order but not value. Status (bad, improved, excellent) is a good
example of an ordinal variable. You know that a patient with a bad condition is not doing as well
as a patient with an advanced condition, but not in terms of how much. Continuous variations can
take up any value in a certain range, and both order and value are stated. Age is a constant variable
and can take values like 14.5 or 22.8 or any intermediate value. You know that a 15-year-old is
one year older than a 14-year-old. Divided (classified) and ordered (classified) in R are referred
to as elements. The factors are important to R because they determine how the data will be
analyzed and presented visually. You will see examples of this throughout the book. The function
factor () stores class values as numerical vectors in width [1 ... k] (where k is the number of values
that differ from the word variable), as well as the internal vector of character strands (real values)
on the map of these numbers. For example, suppose you have a vector
diabetes <- c("Type1", "Type2", "Type1", "Type1")

The statement diabetes <- factor(diabetes) stores this vector as (1, 2, 1, 1) and associates it with
1=Type1 and 2=Type2 internally (the assignment is alphabetical). Any analysis performed on
vector diabetes will treat the variables as determined and select mathematical methods that are
appropriate for this measurement scale. For vegetables representing ordinal variables, add the
ordered parameter = TRUE to the factor () function. You have been given a vector
status <- c("Poor", "Improved", "Excellent", "Poor")
the statement status <- factor(status, ordered=TRUE) will encode the vector as (3, 2, 1, 3) and
associate these values internally as 1=Excellent, 2=Improved, and 3=Poor. Additionally, any
analysis made on this vector will treat you as an ordinal variable and select mathematical methods
accordingly. By default, alphabetical levels are created in alphabetical order. This has worked for
the status quo, because the order "Outstanding," "Advanced," "Poor" makes sense. It would be a
problem if the “Poor” had the “Sick” codes instead, because the order would say “Sick,” “Good,”
“Improved.” There is a similar problem if the requested order is “Poor,” “Improved,” “Excellent.”
With ordered features, the alphabetical default is rarely sufficient. You can write over the default
by defining the levels option. For example,
status <- factor(status, order=TRUE, levels=c("Poor", "Improved", "Excellent"))
would assign the levels as 1=Poor, 2=Improved, 3=Excellent. Be sure that the specified levels
match your actual data values. Any data values not in the list will be set to missing. The following
listing demonstrates how specifying factors and ordered factors impact data analyses.

Firstly, you enter the data as vectors q. Then you need to specify that diabetes is a factor and
status is an ordered factor. Finally, you can combine the data into a data frame. The function
str(object)gives information on an object in R (the data frame in this case) w. It clearly shows
that diabetes is a factor and status is an ordered factor, along with how it’s coded internally. Note
that the summary() functiontreats the variables differently e. It provides the minimum, maximum,
mean, and quartiles for the continuous variable age, and frequency counts for the
categorical variables diabetes and status.

Lists

• A list is a generic object consisting of an ordered collection of objects.

• Lists are heterogeneous data structures.
• These are also one-dimensional data structures.
• A list can be a list of vectors, list of matrices, a list of characters and a list of functions and so
on.

Lists is the most complex data types in R. A list is an ordered collection of objects
(components). List allows you to gather a variety of (possibly unrelated) objects under one name.
For example, a list may contain a combination of vectors, matrices, data frames, and even other
lists. You can create a list using the list() function
mylist <- list(object1, object2, …)
Here, the objects are any of the structures seen so far. Optionally, you can name the objects in a
list:
mylist <- list(name1=object1, name2=object2, …)
The following listing shows an example.
In this example, you can create a list with four components that are: a string, a numeric vector, a
matrix, and a character vector. You can also combine any number of objects and save them as a
list. You can even specify elements of the list by indicating a component number or a name within
double brackets. In this example, mylist[[2]] and mylist[["ages"]] both refer to the same four-
element numeric vector. Lists are important R structures for two reasons. Firstly, they allow you
to organize and recall disparate information in a simple way. Secondly, the results of many R
functions return lists. It’s up to the analyst to pull out the components that are needed. You’ll see
numerous examples of functions that return lists in later chapters.

2.3 Data input

Now that you have data structures, you would need to put some data in them! A data analyst gets
typically faced with data that comes to them from a variety of sources and in a variety of
formats. Their task is to import the data into their tools, analyze the data, then report the results.
R provides a variety of data import tools. R can import data from the keyboard, from flat files,
from Microsoft Excel and Access, from popular statistical packages, from special object formats,
and from a variety of related data management systems. Because you never know where your
data will come from, we’ll look at each of them here. You only need to learn about the ones you
will be using.
Entering data from the keyboard
The easiest way to enter data is from the keyboard. The edit function () in R will require a text
editor that will allow you to manually enter your data. Here are the steps involved:
1. Create a blank data frame (or matrix) with the changing names and modes you want to
have in the database.
2. Request a text editor for this data object, enter your data, and save the results to the data
object.

In the following example, you’ll create a data frame named mydata with three variables: age
(numeric) , gender (character) , and weight (numeric) .

You’ll then invoke the text editor, add your data, and save the results.
mydata <- data.frame(age=numeric(0),
gender=character(0), weight=numeric(0))
mydata <- edit(mydata)

Assignments like age=numeric(0) create a variable of a specific mode, but without actual data.
Note that the result of the editing is assigned back to the object itself. The edit() function
operates on a copy of the object. If you don’t assign it a destination, all of your edits will be lost!

Importing data from a delimited text file

You can also import data from delimited text files using read.table(), a function that reads a file
in table format and saves it as a data frame.

Here’s the syntax:

mydataframe <- read.table(file, header=logical_value,
sep="delimiter", row.names="name")

here file is a delimited ASCII file , header is a logical value indicating whether the first row
contains variable names (TRUE or FALSE), sep specifies the delimiter separating data values,
and row.names is an optional parameter specifying one or more variables to represent row
identifiers. For example, the statement :

grades <- read.table("studentgrades.csv", header=TRUE, sep=",", row.names="STUDENTID")

reads a comma-separated file called studentgrades.csv from the current work directory, detects
dynamic words in the first line of the file, specifies STUDENTID variables as a line identifier,
and saves results as a data framework called distances.

Importing data from Excel

The best way for reading an Excel file is to export it to a comma-delimited file from within Excel
and import it to R using the method described earlier.

Excel 2007 uses XLSX file format, which is a zipped set of XML files. The xlsx package can be
used to access spreadsheets this way. Be sure to download and install it before using it first. The
Read.xlsx () function imports a worksheet from the XLSX file to the data framework. The
simplest format is read.xlsx (file, n) where the file is the path to the Excel 2007 workbook and n
is the worksheet number to be imported. For example, on Windows platform, code

library(xlsx)
workbook <- "c:/myworkbook.xlsx"
mydataframe <- read.xlsx(workbook, 1)
Firstly, imports the first worksheet from the workbook myworkbook.xlsx stored on the C: drive
and saves it as the data frame mydataframe. The xlsx packagecan can do more than import
worksheets. It can create and also manipulate Excel XLSX files as well. Programmers who are
interested to develop an interface between R and Excel should check out this relatively new
package.
Importing data from CSV

we will read data in r by loading a CSV file from Stress-Lysis. “Humidity – Temperature – Step
count – Stress levels” represents the titles for Stress-Lysis.csv file.

You can download data from following link:

https://www.kaggle.com/datasets/laavanya/stress-level-detection

data1 <- read_csv(Stress-Lysis.csv')

head(data1, 5)
Importing data from JSON
we will load JSON into R using a file from the Drake Lyrics dataset. It contains lyrics, song
title, album title, URL, and view count of Drake songs. You can download data from following
link:
https://www.kaggle.com/datasets/juicobowley/drake-lyrics

library(rjson)
JsonData <- fromJSON(file = 'drake_data.json')
print(JsonData[1])

Importing data from XML

Just like the `read_csv` function, we can load the XML data by providing a URL link to the XML
site. It will load the page and parse XML data.

library(xml2)
plant_xml <- read_xml('https://www.w3schools.com/xml/plant_catalog.xml')
plant_xml_parse <- xmlParse(plant_xml)

The XML file can be read after installing the package and then parsing it with xmlparse()
function, which takes as input the XML file name and prints the content of the file in the form of
a list. The file should be located in the current working directory. An additional package named
‘methods’ should also be installed. The following code can be used to read the contents of the
file “sample.xml”.

library("XML")
library("methods")

# the contents of sample.xml are parsed

data <- xmlParse(file = "sample.xml")
print(data)
2.4 Useful functions for working with data objects
Let us see summary of useful functions for working with data objects in following table:
Table: useful functions for working with data objects

We’ve already discussed most of these functions. The functions head() and tail() are useful for
quickly scanning large datasets. For example, head(patientdata) lists the first six rows of the data
frame, whereas tail(patientdata) lists the last six. We’ll cover functions such as length(), cbind(),
and rbind() in the next chapter. They’re gathered here as a reference.

Summary
One of the most challenging tasks in data analysis is data preparation. We got off to a good start
in this chapter by describing the various R structures that provide data storage and the many
available ways to import data from both keyboard and external sources. Specifically, we will use
vector definitions, matrix, data frame, and write over and over again in future chapters. Your
ability to decipher the properties of these properties with bracket notation will be invaluable in
selecting, setting, and converting data.

As you can see, R provides a wealth of activities for accessing external data. This includes data
from flat files, web files, statistical packages, spreadsheets, and details. Although the focus of this
chapter is to import data into R, you can also transfer data from R into these external formats.
2.5 R – Programming Constructs:
Decision making

Decision making structures require the programmer to specify one or more conditions to be evaluated or
tested by the program, along with a statement or statements to be executed if the condition is determined
to be true, and optionally, other statements to be executed if the condition is determined to be false.
Following is the general form of a typical decision making structure found in most of the programming
languages:

R provides the following types of decision making statements.

if statement:

An if statement consists of a Boolean expression followed by one or more statements.

if...else statement :

An if statement can be followed by an optional else statement, which executes when the Boolean
expression is false.

2.5.1.1 Syntax The basic syntax for creating an if statement in R is:

if(boolean_expression)

{ // statement(s) will execute if the boolean expression is true. }

If the Boolean expression evaluates to be true, then the block of code inside the if statement will be
executed. If Boolean expression evaluates to be false, then the first set of code after the end of the if
statement (after the closing curly brace) will be executed

Example:

x <- 30L
if(is.integer(x))
{ print("X is an Integer") }
When the above code is compiled and executed, it produces the following result:

[1] "X is an Integer"

2.5.1.2 Syntax The basic syntax for creating an if...else statement in R is:

if(boolean_expression)
{ // statement(s) will execute if the boolean expression is true. }
else
{ // statement(s) will execute if the boolean expression is false. }

If the Boolean expression evaluates to be true, then the if block of code will be executed, otherwise else
block of code will be executed.

Example:

x <- c("what","is","truth")

if("Truth" %in% x)
{ print("Truth is found") }
else
{ print("Truth is not found") }

When the above code is compiled and executed, it produces the following result:

[1] "Truth is not found"

2.5.1.3 R –Switch Statement

A switch statement allows a variable to be tested for equality against a list of values. Each value is called
a case, and the variable being switched on is checked for each case.

Syntax The basic syntax for creating a switch statement in R is :

switch(expression, case1, case2, case3....)

The following rules apply to a switch statement:

 If the value of expression is not a character string it is coerced to integer.


 You can have any number of case statements within a switch. Each
case is followedby the value to be compared to and a colon.

 If the value of the integer is between 1 and nargs()-1 (The max number
of arguments)then the corresponding element of case condition is
evaluated and the result returned.

 If expression evaluates to a character string then that string is

matched (exactly) tothe names of the elements.
 If there is more than one match, the first matching element is returned.

 No Default argument is available.

 In the case of no match, if there is a unnamed element of ... its value
is returned. (Ifthere is more than one such argument an error is
returned.)

Flow Diagram

2.5.1.4 R LOOPs

There may be a situation when you need to execute a block of code several number of times. In general,
statements are executed sequentially. The first statement in a function is executed first, followed by the
second, and so on. Programming languages provide various control structures that allow for more
complicated execution paths.
A loop statement allows us to execute a statement or group of statements multiple times and the following
is the general form of a loop statement in most of the programming languages:
R programming language provides the following kinds of loop to handle looping
requirements.Click the following links to check their detail.

Loop Type Description

Executes a sequence of statements multiple times and abbreviates thecode that manages
repeat loop
the loop variable.

Repeats a statement or group of statements while a given condition istrue. It tests the
while loop
condition before executing the loop body.

Like a while statement, except that it tests the condition at the end of the loop body.
for loop

Syntax
The basic syntax for creating a repeat loop in R is:

repeat {
commands
if(condition ){
break
}
}
The basic syntax for creating a while loop in R is :

while
(test_expression) {
statement
}

The basic syntax for creating a for loop statement in R is:

for (value in vector) {

statements
}

R’s for loops are particularly flexible in that they are not limited to integers, or even numbersin the input.
We can pass character vectors, logical vectors, lists or expressions.

Example

v <- LETTERS[1:4]
for ( i in v)
{ print(i)
}
R Programming

3 User defined functions in R.

A function is a set of statements organized together to perform a specific task. R has a largenumber of in-
built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with
arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any result which
may be stored in other objects.

An R function is created by using the keyword function. The basic syntax of an R function definition
is as follows:

function_name <- function(arg_1, arg_2, ...)

{ Function body
}

3.1 FunctionComponents
The different parts of a function are:
Function Name: This is the actual name of the function. It is stored in R environmentas an object with this name.
Arguments: An argument is a placeholder. When a function is invoked, you pass a value to the argument.
Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.
Function Body: The function body contains a collection of statements that defineswhat the function does.
Return Value: The return value of a function is the last expression in the function body to be evaluated.

R has many in-built functions which can be directly called in the program without definingthem first.
We can also create and use our own functions referred as user defined functions.

3.2 Built-inFunction

# Create a sequence of numbers from 32 to

print(seq(32,44))

# Find mean of numbers from 25 to 82.

print(mean(25:82))

# Find sum of numbers frm 41 to

print(sum(41:68))

Simple examples of in-built functions are seq(), mean(), max(), sum(x)and paste(...)
etc.They are directly called by user written programs.
3.3 User-definedFunction
We can create user-defined functions in R. They are specific to what a user wants and once created they
can be used like the built-in functions. Below is an example of how a function is created and used.

# Create a function to print squares of numbers in

sequence. new.function <- function(a) {
for(i in 1:a) {
b <- i^2 print(b)
}
}

3.4 CallingaFunction

new.function <-
function(a) { for(i in
1:a) {
b <- i^2 print(b)
}
}

# Call the function new.function supplying 6 as an argument.

new.function(6)

3.4.1.1 Calling a Function with Argument Values (by position and by name)
The arguments to a function call can be supplied in the same sequence as defined in the function or they
can be supplied in a different sequence but assigned to the names of the arguments.
# Create a function with arguments.

new.function <- function(a,b,c) {

result <- a*b+c
print(result)
}

# Call the function by position of arguments.

new.function(5,3,11)

# Call the function by names of the arguments.

new.function(a=11,b=5,c=3)
4 Graphical Analysis using R
Topics to be covered:
Basic Plotting
Manipulating the plotting window
BoxWhisker Plots
Scatter Plots
Pair Plots
Pie Charts
Bar Charts.
4.1 Introduction:
Whenever we analyze data, the first thing that we should do is look at it. For each variable,
what are the most common values? How much of a difference is there? Are there any
unusual observations? R provides many data visualization functions. In this chapter, we’ll
look at graphs that help you understand a single categorical or continuous variable.

This module includes

■ visualizing the distribution of variable
■ Comparing groups on an outcome variable

In both cases, the variable could be continuous (for example, car mileage as miles per
gallon) or categorical (for example, treatment outcome as none, some, or marked). In later
chapters, we will examine graphs showing the bivariate and multivariate relationships
between variables. In the following sections, we’ll examine the use of bar plots, pie charts,
fan charts, histograms, kernel density plots, box plots, violin plots, and dot plots. Some of
these may be familiar to you, whereas others (such as fan plots or violin plots) may be
new to you. Our goal, as constantly, is to apprehend your facts higher and to communicate
this information to others.

4.2 Bar plots

Bar plots display the distribution (frequencies) of a categorical variable
through vertical or horizontal bars. In its simplest form, the format of
the barplot() function is

barplot(height)

where height is a vector or matrix.

In the following examples, we’ll plot the outcome of a study investigating a new treatment
for rheumatoid arthritis. The data are contained in the Arthritis data frame distributed with
the vcd package. Because the vcd package isn’t included in the default R installation, be
sure to download and install it before first use (install. packages("vcd")). Note that
the vcd package isn’t needed to create bar plots. We’re loading it in order to gain access
to the Arthritis dataset.
Simple bar plots:

If height is a vector, the values determine the heights of the bars in the plot and a vertical
bar plot is produced. Including the option horiz=TRUE produces a horizontal bar chart
instead. You can also add annotating options. The main option adds a plot title, whereas
the xlab and ylab options add x-axis and y-axis labels, respectively.

In the Arthritis study, the variable Improved records the patient outcomes for individuals
receiving a placebo or drug.
1. library(vcd)
2. counts <- table(Arthritis$Improved)
3. counts
None Some Marked
42 14 28

Here, we see that 28 patients showed marked improvement, 14 showed some

improvement, and 42 showed no improvement. You can graph the variable counts using
a vertical or horizontal bar plot. The code is provided in the following listing and the
resulting graphs are displayed in below figure:

Figure: Simple vertical and horizontal bar charts

Stacked and grouped bar plots:

If height is a matrix instead of a vector, the resulting graph will be a stacked or grouped
bar plot. If beside=FALSE (the default), then each column of the matrix produces a bar
in the plot, with the values in the column giving the heights of stacked “sub-bars.”
If beside=TRUE, each column of the matrix represents a group, and the values in each
column are juxtaposed instead of stacked.
Consider the cross-tabulation of treatment type and improvement status:
1. library(vcd)
2. Counts <-table(Arthritis$Improved, Arthritis$Treatment)
3. Counts Treatment

The first barplot function produces a stacked bar plot, whereas the second produces a
grouped bar plot. We’ve also added the col option to add color to the bars plotted.
The legend.text parameter provides bar labels for the legend (which are only useful
when height is a matrix).

Mean bar plots:

Bar plots needn’t be based on counts or frequencies. You can create bar plots that
represent means, medians, standard deviations, and the rest by using the
aggregate function and passing the results to the barplot() function. The following list
illustrates an example,
The above Listing sorts the means from smallest to largest q. Also note that use
of the title() function w is equivalent to adding the main option in the plot
call. means$x is the vector containing the heights of the bars, and the
option names.arg=means$Group.1 is added to provide labels.

Tweaking bar plots:

There are several ways to tweak the appearance of a bar plot. For example, with
multiple bars, labels can start to overlap. You can decrease the font size using the cex.
names option. Specifying values smaller than 1 will reduce the size of the
labels. Voluntarily, the names.arg argument allows you to specify a character vector of
names used to label the bars. You can also use graphical parameters to help text spacing.
An example is given in the following list with the output displayes in below figure:

In this example, we’ve rotated the bar labels (with las=2), changed the label text, and both
increased the size of the y margin (with mar) and decreased the font size in order to fit the
labels comfortably (using cex.names=0.8). The par() function allows you to make
extensive modifications to the graphs that R produces by default. See chapter 3 for more
details.

Spinograms:
Before finishing our discussion of bar plots, let’s take a look at a specialized version called
a spinogram. In a spinogram, a stacked bar plot is rescaled so that the height of each bar
is 1 and the segment heights represent proportions. Spinograms are created through
the spine() function of the vcd package. The following code produces a
simple spinogram:

library(vcd)
attach(Arthritis)
counts <- table(Treatment,
Improved) spine(counts,
main="Spinogram Example")
detach(Arthritis)

Figure: Spinogram of arthritis treatment outcome

4.3 Box plots:
A “box-and-whiskers” plot describes the distribution of a continuous variable
by ploting its five-number summary: the minimum, lower quartile (25th percentile),
median (50th percentile), upper quartile (75th percentile), and maximum. It can also
indicate potential views outliers (values outside the range of ± 1.5*IQR, where IQR is the
interquartile range defined as the upper quartile minus the lower quartile).

For example:
boxplot(mtcars$mpg, main="Box plot", ylab="Miles per Gallon")

produces the plot shown in below figure.

By default, each whisker extends to the most extreme data point, which is no more than
the 1.5 times the interquartile range for the box. Values outside this range are depicted as
dots (not shown here).

For example, in our sample of cars the median mpg is 19.2, 50 percent of the scores fall
between 15.3 and 22.8, the smallest value is 10.4, and the largest value is 33.9. How did
I read this so precisely from the graph? Issuing boxplot.stats(mtcars$mpg) prints the
statistics used to build the graph. There doesn’t appear to be any outliers, and there is a
mild positive skew (the upper whisker is longer than the lower whisker).

Using parallel box plots to compare

groups:
Box plots can be created for individual variables or for variables by group. The format is

boxplot(formula, data=dataframe)

where formula is a formula and dataframe denotes the data frame (or list) providing the
data. An example of a formula is y ~ A, where a separate box plot for numeric variable y is
generated for each value of categorical variable A. The formula y ~ A*B would produce
a box plot of numeric variable y, for each combination of levels in categorical
variables A and B.

Adding the option varwidth=TRUE will make the box plot widths proportional to the
square root of their sample sizes. Add horizontal=TRUE to reverse the axis orientation.
In the following code, we revisit the impact of four, six, and eight cylinders on auto
mpg with parallel box plots. The plot is provided in below figure.

boxplot(mpg ~ cyl, data=mtcars,

main="Car Mileage Data",
xlab="Number of Cylinders",
ylab="Miles Per Gallon")

You can see in above figure that there’s a good separation of groups based on gas mile-
age. You can also see that the distribution of mpg for six-cylinder cars is
more symmetrical than for the other two car types. Cars with four cylinders show the
greatest spread (and positive skew) of mpg scores, when compared with six- and eight-
cylinder cars. There’s also an outlier in the eight-cylinder group.

Box plots are very versatile. By adding notch=TRUE, you get notched box plots. If two
boxes’ notches don’t overlap, there’s strong evidence that their medians differ .The
following code will create notched box plots for our mpg example:

boxplot(mpg ~ cyl, data=mtcars,

notch=TRUE,
varwidth=TRUE,
col="red",
main="Car Mileage Data",
xlab="Number of Cylinders",
ylab="Miles Per Gallon")

The col option fills the box plots with a red color, and varwidth=TRUE produces box
plots with widths that are proportional to their sample sizes. You can see in below figure
that the median car mileage for four-, six-, and eight- cylinder cars differ. Mileage is
significantly reduced by the number of cylinders.

Finally, you can produce box plots for more than one grouping factor. Listing code
provides box plots for mpg versus the number of cylinders and transmission type in an
automobile. Again, you use the col option to fill the box plots with color. Note that colors
recycle. In this case, there are six box plots and only two specified colors, so the colors
repeat three times.

From below figure it’s again clear that median mileage decreases with cylinder number.
For four and six- cylinder cars, mileage is higher for standard transmissions. But for eight-
cylinder cars there doesn’t appear to be a difference. You can also see from the widths of
the box plots that standard four-cylinder and automatic eight - cylinder cars are the most
common in this dataset.
4.4 Dot plots:

Dot plots provide a method of plotting a large number of labeled values on a simple
horizontal scale. You create them with the dotchart() function, using the format

dotchart(x, labels=)

where x is a numeric vector and labels specifies a vector that labels each point. You can
add a groups option to designate a factor specifying how the elements of x are grouped. If
so, the option gcolor controls the color of the groups label and cex controls the size of
the labels. Here’s an example with the mtcars dataset:

dotchart(mtcars$mpg, labels=row.names(mtcars), cex=.7, main="Gas Mileage for Car

Models",
xlab="Miles Per Gallon")

The resulting plot is given in below figure:

The graph in figure allows you to see the mpg for each make of car on the same horizontal
axis. Dot plots typically become most interesting when they’re sorted and grouping
factors are distinguished by symbol and color. An example is given in the following
listing.

In this example, the data frame mtcars is sorted by mpg (lowest to highest) and saved as
data frame x. The numeric vector cyl is transformed into a factor. A character vector
(color) is added to data frame x and contains the values "red", "blue", or "dark-
green" depending on the value of cyl. In addition, the labels for the data points are taken
from the row names of the data frame (car makes). Data points are grouped by number of
cylinders. The numbers 4, 6, and 8 are printed in black. The color of the points and labels
are derived from the color vector, and points are represented by filled circles. The code
produces the graph in below figure:

In above figure ,a number of features become evident for the first time. Again, you see an
increase in gas mileage as the number of cylinders decrease. But you also see exceptions.
For example, the Pontiac Firebird, with eight cylinders, gets higher gas mileage than the
Mercury 280C and the Valiant, each with six cylinders. The Hornet 4 Drive, with six
cylinders, gets the same miles per gallon as the Volvo 142E, which has four cylinders.
It’s also clear that the Toyota Corolla gets the best gas mileage by far, whereas the Lincoln
Continental and Cadillac Fleetwood are outliers on the low end. You can gain significant
insight from a dot plot in this example because each point is labeled, the value of each
point is inherently meaningful, and the points are arranged in a manner that promotes
comparisons. But as the number of data points increase, the utility of the dot plot
decreases.

4.5 Pie charts:

Whereas pie charts are ubiquitous in the business world, they’re denigrated by most
statisticians, including the authors of the R documentation. They recommend bar or dot
plots over pie charts because people are able to judge length more accurately than volume.
Perhaps for this reason, the pie chart options in R are quite limited when compared with
other statistical software.

Pie charts are created with the function

pie(x, labels)

where x is a non-negative numeric vector indicating the area of each slice

and labels provides a character vector of slice labels. Four examples are given in the next
listing; the resulting plots are provided in below figure:
First you set up the plot so that four graphs are combined into one q. Then you input the
data that will be used for the first three graphs.

For the second pie chart w, you convert the sample sizes to percentages and add the
information to the slice labels. The second pie chart also defines the colors of the slices
using the rainbow() function. Here rainbow(length(lbls2)) resolves to rainbow(5),
providing five colors for the graph.

The third pie chart is a 3D chart created using the pie3D() function from
the plotrix package. Be sure to download and install this package before using it for the
first time. If statisticians dislike pie charts, they positively despise 3D pie charts (although
they may secretly find them pretty). This is because the 3D effect adds no additional
insight into the data and is considered distracting eye candy.

The fourth pie chart demonstrates how to create a chart from a table e. In this
case, you count the number of states by US region, and append the information to the
labels before producing the plot.

Pie charts make it difficult to compare the values of the slices (unless the values are
appended to the labels). For example, looking at the simple pie chart, can you tell how
the US compares to Germany? (If you can, you’re more perceptive than I am.) In an
attempt to improve on this situation, a variation of the pie chart, called a fan plot, has been
developed. The fan plot (Lemon & Tyagi, 2009) provides the user with a way to display
both relative quantities and differences. In R, it’s implemented through
the fan.plot() function in the plotrix package.

Consider the following code and the resulting graph :

library(plotrix)
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France") fan.plot(slices, labels = lbls,
main="Fan Plot")

In a fan plot, the slices are rearranged to overlap each other and the radii have been
modified so that each slice is visible. Here you can see that Germany is the largest slice

and that the US slice is roughly 60 percent as large. France appears to be half as large as
Germany and twice as large as Australia. Remember that the width of the slice and not
the radius is what’s important here.
As you can see, it’s much easier to determine the relative sizes of the slice in a fan plot
than in a pie chart. Fan plots haven’t caught on yet, but they’re new. Now that we’ve
covered pie and fan charts, let’s move on to histograms. Unlike bar plots and pie charts,
histograms describe the distribution of a continuous variable.
5 Advanced R
Topics to be covered:
Statistical models in R
Correlation and regression analysis
Analysis of Variance (ANOVA)
creating data for complex analysis
Summarizing data, and case studies
5.1 Correlations:

Correlation coefficients are used to describe relationships among quantitative variables. The
sign ± indicates the direction of the relationship (positive or inverse) and the magnitude
indicates the strength of the relationship (ranging from 0 for no relationship to 1 for a perfectly
predictable relationship).
Types of correlations:
R can produce a variety of correlation coefficients, including Pearson, Spearman, Kendall,
partial, polychoric, and polyserial. Let’s look at each in turn.
PEARSON, SPEARMAN, AND KENDALL CORRELATIONS
The Pearson product moment correlation assesses the degree of linear relationship between
two quantitative variables. Spearman’s Rank Order correlation coefficient assesses the degree
of relationship between two rank-ordered variables. Kendall’s Tau is also a nonparametric
measure of rank correlation.
The cor() function produces all three correlation coefficients, whereas the cov() function
provides covariances. There are many options, but a simplified format for producing
correlations is
cor(x, use= , method= )

The default options are use="everything" and method="pearson". You can see an example in
the following listing.
The first call produces the variances and covariances. The second provides Pearson Product
Moment correlation coefficients, whereas the third produces Spearman Rank Order correlation
coefficients. You can see, for example, that a strong positive correlation exists between income
and high school graduation rate and that a strong negative correlation exists between illiteracy
rates and life expectancy. Notice that you get square matrices by default (all variables crossed
with all other variables).
PARTIAL CORRELATIONS:
A partial correlation is a correlation between two quantitative variables, controlling for one or
more other quantitative variables. You can use the pcor() function in the ggm package to provide
partial correlation coefficients. The ggm package isn’t installed by default, so be sure to install
it on first use.
The format is
pcor(u, S)
where u is a vector of numbers, with the first two numbers the indices of the variables to be
correlated, and the remaining numbers the indices of the conditioning variables (that is, the
variables being partialed out). S is the covariance matrix among the variables. An example will
help clarify this:
1. library(ggm)
2. # partial correlation of population and murder rate, controlling
3. # for income, illiteracy rate, and HS graduation rate
4. pcor(c(1,5,2,3,6), cov(states)) [1] 0.346
In this case, 0.346 is the correlation between population and murder rate, controlling for the
influence of income, illiteracy rate, and HS graduation rate. The use of partial correlations is
common in the social sciences.
OTHER TYPES OF CORRELATIONS
The hetcor() function in the polycor package can compute a heterogeneous correlation matrix
containing Pearson product-moment correlations between numeric variables, polyserial
correlations between numeric and ordinal variables, polychoric correlations between ordinal
variables, and tetrachoric correlations between two dichotomous variables. Polyserial,
polychoric, and tetrachoric correlations assume that the ordinal or dichotomous variables are
derived from underlying normal distribu- tions. See the documentation that accompanies this
package for more information.
5.2 Testing correlations for significance:
Once you’ve generated correlation coefficients, how do you test them for statistical sig-
nificance? The typical null hypothesis is no relationship (that is, the correlation in the population
is 0). You can use the cor.test() function to test an individual Pearson,
Spearman, and Kendall correlation coefficient. A simplified format is
cor.test(x, y, alternative = , method = )
where x and y are the variables to be correlated, alternative specifies a two-tailed or one- tailed
test ("two.side", "less", or "greater") and method specifies the type of correlation ("pearson",
"kendall", or "spearman") to compute. Use alternative="less" when the research hypothesis is
that the population correlation is less than 0. Use alternative="greater" when the research
hypothesis is that the population correlation is greater than 0. By default,
alternative="two.side" (population correlation isn’t equal to 0) is assumed. See the following
listing for an example.

This code tests the null hypothesis that the Pearson correlation between life expectancy and
murder rate is 0. Assuming that the population correlation is 0, you’d expect to see a sample
correlation as large as 0.703 less than 1 time out of 10 million (that is, p = 1.258e-08). Given how
unlikely this is, you reject the null hypothesis in favor of the research hypothesis, that the
population correlation between life expectancy and murder rate is not 0.
Unfortunately, you can test only one correlation at a time using cor.test. Luckily, the
corr.test() function provided in the psych package allows you to go further. The corr.test()
function produces correlations and significance levels for matrices of Pearson, Spearman, or
Kendall correlations. An example is given in the following listing.
The use= options can be "pairwise" or "complete" (for pairwise or listwise dele- tion of
missing values, respectively). The method= option is "pearson" (the default), "spearman", or
"kendall". Here you see that the correlation between population size and high school graduation
rate (–0.10) is not significantly different from 0 (p = 0.5).
5.3 Regression:
In many ways, regression analysis lives at the heart of statistics. It’s a broad term for a set of
methodologies used to predict a response variable (also called a dependent, criterion, or
outcome variable) from one or more predictor variables (also called independent or explanatory
variables). In general, regression analysis can be used to identify the explanatory variables that
are related to a response variable, to describe the form of the relationships involved, and to
provide an equation for predicting the response variable from the explanatory variables.
The many faces of regression:

The term regression can be confusing because there are so many specialized varieties refer
below table. In addition, R has powerful and comprehensive features for fitting regression
models, and the abundance of options can be confusing as well.
Scenarios for using OLS regression:
In OLS regression, a quantitative dependent variable is predicted from a weighted sum of
predictor variables, where the weights are parameters estimated from the data. Let’s take a
look at a concrete example (no pun intended), loosely adapted from Fwa (2006). An engineer
wants to identify the most important factors related to bridge deterioration (such as age,
traffic volume, bridge design, construction materials and methods, construction quality, and
weather conditions) and determine the mathematical form of these relationships. She
collects data on each of these variables from a representative sample of bridges and models
the data using OLS regression.
The approach is highly interactive. She fits a series of models, checks their compliance with
underlying statistical assumptions, explores any unexpected or aberrant findings, and finally
chooses the “best” model from among many possible models. If successful, the results will
help her to
1. Focus on important variables, by determining which of the many collected variables are
useful in predicting bridge deterioration, along with their relative importance.
2. Look for bridges that are likely to be in trouble, by providing an equation that can be
used to predict bridge deterioration for new cases (where the values of the predictor
variables are known, but the degree of bridge deterioration isn’t).
3. Take advantage of serendipity, by identifying unusual bridges. If she finds that some
bridges deteriorate much faster or slower than predicted by the model, a study of these
“outliers” may yield important findings that could help her to understand the
mechanisms involved in bridge deterioration.
5.3.2.1 OLS regression:

For most of this chapter, we’ll be predicting the response variable from a set of pre-
dictor variables (also called “regressing” the response variable on the predictor
variables—hence the name) using OLS.

Our goal is to select model parameters (intercept and slopes) that minimize the difference
between actual response values and those predicted by the model. Specifically, model
parameters are selected to minimize the sum of squared residuals

To properly interpret the coefficients of the OLS model, you must satisfy a number of statistical
assumptions:
1. Normality —For fixed values of the independent variables, the dependent variable is
normally distributed.
2. Independence —The Yi values are independent of each other.
3. Linearity —The dependent variable is linearly related to the independent variables.
4. Homoscedasticity —The variance of the dependent variable doesn’t vary with the levels
of the independent variables. We could call this constant variance, but saying homoscedasticity
makes me feel smarter.

If you violate these assumptions, your statistical significance tests and confidence intervals may
not be accurate. Note that OLS regression also assumes that the independent variables are
fixed and measured without error, but this assumption is typically relaxed in practice.
Simple linear regression:
The dataset women in the base installation provides the height and weight for a set of 15
women ages 30 to 39. We want to predict weight from height. Having an equation for
predicting weight from height can help us to identify overweight or underweight individuals.
The analysis is provided in the following listing, and the resulting graph is shown in figure.
From the output, you see that the prediction equation is
Weight = - 87.52 + 3.45 * Height
Because a height of 0 is impossible, you wouldn’t try to give a physical interpretation to the
intercept. It merely becomes an adjustment constant. From the Pr(>|t|) column, you see that
the regression coefficient (3.45) is significantly different from zero (p < 0.001) and indicates
that there’s an expected increase of 3.45 pounds of weight for every 1 inch increase in height.
The multiple R-squared (0.991) indicates that the model accounts for 99.1 percent of the
variance in weights. The multiple R-squared is also the squared correlation between the actual
and predicted value (that is, R2 = r 2 ). The residual standard error (1.53 lbs.) can be thought of
as the average error in
predicting weight from height using this model. The F statistic tests whether the predictor
variables taken together, predict the response variable above chance levels. Be- cause there’s
only one predictor variable in simple regression, in this example the F test is equivalent to the
t-test for the regression coefficient for height.
For demonstration purposes, we’ve printed out the actual, predicted, and residual values.
Evidently, the largest residuals occur for low and high heights, which can also be seen in the
plot in above figure.
5.4 Polynomial regression:

The plot in the above figure suggests that you might be able to improve your prediction using a
regression with a quadratic term (that is, X 2).
You can fit a quadratic equation using the statement
fit2 <- lm(weight ~ height + I(height^2), data=women)
The new term I(height^2) requires explanation. height^2 adds a height-squared term to the
prediction equation. The I function treats the contents within the paren- theses as an R regular
expression. You need this because the ^ operator has a special meaning in formulas that you
don’t want to invoke here .
Listing below shows the results of fitting the quadratic equation.
and both regression coefficients are significant at the p < 0.0001 level. The amount of variance
accounted for has increased to 99.9 percent. The significance of the squared term (t = 13.89, p
< .001) suggests that inclusion of the quadratic term improves the model fit. If you look at the
plot of fit2 in below figure you can see that the curve does indeed provides a better fit.

5.5 Fitting ANOVA models:

Although ANOVA and regression methodologies developed separately, functionally they’re

both special cases of the general linear model. we’ll primarily use the aov() function in this
chapter. The results of lm() and aov() are equivalent, but the aov() function presents these
results in a format that’s more familiar to ANOVA methodologists.
The aov() function:
The syntax of the aov() function is aov(formula, data=dataframe). The below table describes
special symbols that can be used in the formulas. In this table, y is the dependent variable and
the letters A, B, and C represent factors.

Table below provides formulas for several common research designs. In this table, low- ercase
letters are quantitative variables, uppercase letters are grouping factors, and Subject is a
unique identifier variable for subjects.

 The order of formula terms:

The order in which the effects appear in a formula matters when (a) there’s more than one
factor and the design is unbalanced, or (b) covariates are present. When either of these two
conditions is present, the variables on the right side of the equation will be correlated with
each other. In this case, there’s no unambiguous way to divide up their impact on the
dependent variable. For example, in a two-way ANOVA with unequal numbers of observations
in the treatment combinations, the model y ~ A*B will not produce the same results as the
model y ~ B*A.
By default, R employs the Type I (sequential) approach to calculating ANOVA effects (see the
sidebar “Order counts!”). The first model can be written out as y ~ A + B + A:B. The resulting R
ANOVA table will assess
1. The impact of A on y
2. The impact of B on y, controlling for A
3. The interaction of A and B, controlling for the A and B main effects
The greater the imbalance in sample sizes, the greater the impact that the order of the terms
will have on the results. In general, more fundamental effects should be listed earlier in the
formula. In particular, covariates should be listed first, followed by main effects, followed by
two-way interactions, followed by three-way interactions, and so on. For main effects, more
fundamental variables should be listed first. Thus gender would be listed before treatment.
5.5.1.1 One-way ANOVA:
In a one-way ANOVA, you’re interested in comparing the dependent variable means of two or
more groups defined by a categorical grouping factor. Our example comes from the cholesterol
dataset in the multcomp package, and taken from Westfall, To- bia, Rom, & Hochberg (1999).
Fifty patients received one of five cholesterol-reducing drug regiments (trt). Three of the
treatment conditions involved the same drug administered as 20 mg once per day (1time),
10mg twice per day (2times), or 5 mg four times per day (4times). The two remaining conditions
(drugD and drugE) represented competing drugs. Which drug regimen produced the greatest
cholesterol reduction (response)? The analysis is provided in the following listing.
Looking at the output, you can see that 10 patients received each of the drug regiments .
From the means, it appears that drugE produced the greatest cholesterol reduction, whereas
1time produced the least . Standard deviations were relatively constant across the five
groups, ranging from 2.88 to 3.48 . The ANOVA F test for treatment (trt) is significant (p <
.0001), providing evidence that the five treatments aren’t all equally effective . The
plotmeans() function in the gplots package can be used to produce a graph of group means and
their confidence intervals . A plot of the treatment means, with 95 percent confidence limits,
is provided in the below and allows you to clearly see these treatment differences.

5.5.1.2 One-way ANCOVA:

A one-way analysis of covariance (ANCOVA) extends the one-way ANOVA to include one or
more quantitative covariates. This example comes from the litter dataset in the multcomp
package (see Westfall et al., 1999). Pregnant mice were divided into four treatment groups;
each group received a different dose of a drug (0, 5, 50, or 500). The mean post-birth weight
for each litter was the dependent variable and gestation time was included as a covariate. The
analysis is given in the following listing.
From the table() function you can see that there are an unequal number of litters at each
dosage level, with 20 litters at zero dosage (no drug) and 17 litters at dosage 500. Based on the
group means provided by the aggregate() function, the no-drug group had the highest mean
litter weight (32.3). The ANCOVA F tests indicate that (a) gestation time was related to birth
weight, and (b) drug dosage was related to birth weight after controlling for gestation time.
The mean birth weight isn’t the same for each of the drug dosages, after controlling for
gestation time.
Because you’re using a covariate, you may want to obtain adjusted group means— that is, the
group means obtained after partialing out the effects of the covariate. You can use the effect()
function in the effects library to calculate adjusted means:
1. library(effects)
2. effect("dose", fit)

dose effect
dose
0 5 50 500
32.4 28.9 30.6 29.3
In this case, the adjusted means are similar to the unadjusted means produced by the
aggregate() function, but this won’t always be the case. The effects package provides a
powerful method of obtaining adjusted means for complex research designs and presenting
them visually. See the package documentation on CRAN for more details.
As with the one-way ANOVA example in the last section, the F test for dose indicates that the
treatments don’t have the same mean birth weight, but it doesn’t tell you which means differ
from one another. Again you can use the multiple comparison procedures provided by the
multcomp package to compute all pairwise mean comparisons. Additionally, the multcomp
package can be used to test specific user- defined hypotheses about the means.
 Assessing test assumptions
ANCOVA designs make the same normality and homogeneity of variance assumptions
described for ANOVA designs. In addition, standard ANCOVA designs assumes homogeneity
of regression slopes. In this case, it’s assumed that the regression slope for predicting birth
weight from gestation time is the same in each of the four treatment groups. A test for the
homogeneity of regression slopes can be obtained by including a gestation*dose interaction
term in your ANCOVA model. A significant interaction would imply that the relationship
between gestation and birth weight depends on the level of the dose variable. The code and
results are provided in the following listing.

The interaction is nonsignificant, supporting the assumption of equality of slopes. If the

assumption is untenable, you could try transforming the covariate or dependent variable, using
a model that accounts for separate slopes, or employing a nonparametric ANCOVA method
that doesn’t require homogeneity of regression slopes. See the sm.ancova() function in the sm
package for an example of the latter.

 Visualizing the results:

The ancova() function in the HH package provides a plot of the relationship between the
dependent variable, the covariate, and the factor. For example:
1. library(HH)
2. ancova(weight ~ gesttime + dose, data=litter)

produces the plot shown in the following figure 9.5. Note: the figure has been modified to
display better in black and white and will look slightly different when you run the code yourself.
Here you can see that the regression lines for predicting birth weight from gestation time are
parallel in each group but have different intercepts. As gestation time increases, birth weight
increases. Additionally, you can see that the 0-dose group has the largest intercept and the 5-
dose group has the lowest intercept. The lines are parallel because you’ve specified them to
be. If you’d used the statement ancova(weight ~ gesttime*dose) instead, you’d generate a plot
that allows both the slopes and intercepts to vary by group. This approach is useful for
visualizing the case where the homogeneity of regression slopes doesn’t hold.
weight ~ gesttime + dose
Two-way factorial ANOVA:

In a two-way factorial ANOVA, subjects are assigned to groups that are formed from the cross-
classification of two factors. This example uses the ToothGrowth dataset in the base
installation to demonstrate a two-way between-groups ANOVA. Sixty guinea pigs are randomly
assigned to receive one of three levels of ascorbic acid (0.5, 1, or 2mg), and one of two delivery
methods (orange juice or Vitamin C), under the restriction that each treatment combination
has 10 guinea pigs. The dependent variable is tooth length. The following listing shows the code
for the analysis.
The table statement indicates that you have a balanced design (equal sample sizes in each cell
of the design), and the aggregate statements provide the cell means and standard deviations.
The ANOVA table provided by the summary() function indicates that both main effects (supp
and dose) and the interaction between these factors are significant.
You can visualize the results in several ways. You can use the interaction.plot()
function to display the interaction in a two-way ANOVA. The code is
interaction.plot(dose, supp, len, type="b",
col=c("red","blue"), pch=c(16, 18),
main = "Interaction between Dose and Supplement Type")
and the resulting plot is presented in above figure. The plot provides the mean tooth length for
each supplement at each dosage.
With a little finesse, you can get an interaction plot out of the plotmeans() function in the gplots
package. The following code produces the graph in below figure:
library(gplots)
plotmeans(len ~ interaction(supp, dose, sep=" "), connect=list(c(1,3,5),c(2,4,6)), col=c("red",
"darkgreen"),
main = "Interaction Plot with 95% CIs", xlab="Treatment and Dose Combination")
The graph includes the means, as well as error bars (95 percent confidence intervals) and
sample sizes.
All graphs indicate that tooth growth increases with the dose of ascorbic acid for both orange
juice and Vitamin C. For the 0.5 and 1mg doses, orange juice produced more tooth growth than
Vitamin C. For 2mg of ascorbic acid, both delivery methods produced identical growth. Of the
three plotting methods provided, I prefer the in- teraction2wt() function in the HH package. It
displays both the main effects (the box plots) and the two-way interactions for designs of any
complexity (two-way ANOVA, three-way ANOVA, etc.).
5.6 ANOVA as regression:

We noted that ANOVA and regression are both special cases of the same general linear model.
As such, the designs in this chapter could have been analyzed using the lm() function. However,
in order to understand the output, you need to understand how R deals with categorical
variables when fitting models.
Consider the one-way ANOVA problem discussed earliear in this unit ,which compares the
impact of five cholesterol-reducing drug regiments (trt).
1. library(multcomp)
2. levels(cholesterol$trt)

[1] "1time" "2times" "4times" "drugD" "drugE"

First, let’s fit the model using the aov() function:
3. fit.aov <- aov(response ~ trt, data=cholesterol)
4. summary(fit.aov)
What are we looking at? Because linear models require numeric predictors, when the lm()
function encounters a factor, it replaces that factor with a set of numeric variables representing
contrasts among the levels. If the factor has k levels, k-1 contrast variables will be created. R
provides five built-in methods for creating these contrast variables in below table. You can also
create your own (we won’t cover that here). By default, treatment contrasts are used for
unordered factors and orthogonal polynomials are used for ordered factors.

If a patient is in the drugD condition, then the variable drugD equals 1, and the variables
2times, 4times, and drugE will each equal zero. You don’t need a variable for the first group,
because a zero on each of the four indicator variables uniquely deter- mines that the patient is
in the 1times condition.
This unit have covered the statistical methods most often used by researchers in a wide variety
of fields.

Level 2 Elementary Myperfectice
No ratings yet
Level 2 Elementary Myperfectice
101 pages
Tutorialspoint For R PDF
100% (2)
Tutorialspoint For R PDF
34 pages
R Programming Lab
No ratings yet
R Programming Lab
26 pages
R Tutorial PDF
100% (2)
R Tutorial PDF
196 pages
R Programming Lab
100% (1)
R Programming Lab
46 pages
R Tutorial
100% (2)
R Tutorial
196 pages
Schedule For Early Number Assessment (SENA 3) Recording Sheet
100% (2)
Schedule For Early Number Assessment (SENA 3) Recording Sheet
7 pages
R Programming ChatGPT
No ratings yet
R Programming ChatGPT
106 pages
Assignment For MCA 3rd Sem HPU R Programming
No ratings yet
Assignment For MCA 3rd Sem HPU R Programming
31 pages
R Programming Notes
100% (1)
R Programming Notes
32 pages
RProgramming
No ratings yet
RProgramming
192 pages
R Language 1st Unit Deep
100% (3)
R Language 1st Unit Deep
61 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
13 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
CS01207
No ratings yet
CS01207
3 pages
Final Demo Detailed Lesson Plan in Math 4
No ratings yet
Final Demo Detailed Lesson Plan in Math 4
7 pages
01-MSBA-615 - Introduction To R Programming and R Studio
No ratings yet
01-MSBA-615 - Introduction To R Programming and R Studio
47 pages
Module 1 - Introduction To R
No ratings yet
Module 1 - Introduction To R
18 pages
CH02 Introduction To R
No ratings yet
CH02 Introduction To R
22 pages
R Programming Lab
No ratings yet
R Programming Lab
46 pages
R Assignment Final
No ratings yet
R Assignment Final
12 pages
Basic+R Course
No ratings yet
Basic+R Course
30 pages
Genetics
No ratings yet
Genetics
392 pages
Unit 1
No ratings yet
Unit 1
22 pages
Sanju - R
No ratings yet
Sanju - R
34 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
73 pages
R Programming in Statistics
No ratings yet
R Programming in Statistics
403 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
Unit 1
No ratings yet
Unit 1
19 pages
Research Methodology in Commerce Lab File - Nehal Garg
No ratings yet
Research Methodology in Commerce Lab File - Nehal Garg
110 pages
Unit 1 - Data Analysis Using R
No ratings yet
Unit 1 - Data Analysis Using R
28 pages
R With RStudio For Introductory Statistics
No ratings yet
R With RStudio For Introductory Statistics
163 pages
Stats With R
No ratings yet
Stats With R
103 pages
R Intro Script
No ratings yet
R Intro Script
86 pages
W1 Class Overview and R Basics
No ratings yet
W1 Class Overview and R Basics
33 pages
Computing-II - Lecture Notes-I
No ratings yet
Computing-II - Lecture Notes-I
72 pages
R Practical Report
No ratings yet
R Practical Report
55 pages
CS01208
No ratings yet
CS01208
3 pages
Ayush Lab File R
No ratings yet
Ayush Lab File R
25 pages
R Script
No ratings yet
R Script
25 pages
Lab Manual
No ratings yet
Lab Manual
46 pages
BA-unit 3.
No ratings yet
BA-unit 3.
17 pages
Ashish Srivastava R Lab File
No ratings yet
Ashish Srivastava R Lab File
25 pages
Lec1 PDF
No ratings yet
Lec1 PDF
25 pages
Lec1 PDF
No ratings yet
Lec1 PDF
25 pages
Lec1 PDF
No ratings yet
Lec1 PDF
25 pages
Lec1 PDF
No ratings yet
Lec1 PDF
25 pages
Lec1 PDF
No ratings yet
Lec1 PDF
25 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
R Lanaguage
No ratings yet
R Lanaguage
25 pages
R Programming Lab
No ratings yet
R Programming Lab
48 pages
R Programmimg Lab FIle
No ratings yet
R Programmimg Lab FIle
35 pages
R Program Questions 1-24
No ratings yet
R Program Questions 1-24
56 pages
Bayes CPH - Tutorial R
No ratings yet
Bayes CPH - Tutorial R
9 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
Topic 1 - Intro To Basics
No ratings yet
Topic 1 - Intro To Basics
38 pages
CH 3 PDF
No ratings yet
CH 3 PDF
50 pages
R Short Course
No ratings yet
R Short Course
40 pages
Dissertation Topics in Electronics and Communication
100% (1)
Dissertation Topics in Electronics and Communication
4 pages
Unit - 3
No ratings yet
Unit - 3
64 pages
4$20 Triangles$20 Proofs
No ratings yet
4$20 Triangles$20 Proofs
23 pages
Lab Manual (PHYS 001)
No ratings yet
Lab Manual (PHYS 001)
36 pages
Railway Track
No ratings yet
Railway Track
119 pages
Class XI Subject Mathematics Topic Mensuration Sub Topic (01) Volumes and Surface Areas No. of Sessions Three
No ratings yet
Class XI Subject Mathematics Topic Mensuration Sub Topic (01) Volumes and Surface Areas No. of Sessions Three
7 pages
Addition and Subtraction For First Grade Lesson Plan With Rubic 1
No ratings yet
Addition and Subtraction For First Grade Lesson Plan With Rubic 1
3 pages
Aristotle Galileo and Newton Reading With Reflection
No ratings yet
Aristotle Galileo and Newton Reading With Reflection
11 pages
Business Communication 1734430814974
No ratings yet
Business Communication 1734430814974
34 pages
Mechatronics
No ratings yet
Mechatronics
26 pages
Introductiontothemacroeconomicenvironmentpolicyplayersand National Income 1205241750676882805
No ratings yet
Introductiontothemacroeconomicenvironmentpolicyplayersand National Income 1205241750676882805
40 pages
A Review On Design of Pile Foundations in Bangkok: Geotechnical Engineering April 2015
No ratings yet
A Review On Design of Pile Foundations in Bangkok: Geotechnical Engineering April 2015
11 pages
The Binomial Theorem: Examples: X X N C C C C C C C
No ratings yet
The Binomial Theorem: Examples: X X N C C C C C C C
3 pages
从LQR角度看RL和控制
No ratings yet
从LQR角度看RL和控制
28 pages
Math 8 Condensed Syllabus 2024 - 2025
No ratings yet
Math 8 Condensed Syllabus 2024 - 2025
3 pages
Gauss-Siedel Method: Civil Engineering Majors Authors: Autar Kaw
No ratings yet
Gauss-Siedel Method: Civil Engineering Majors Authors: Autar Kaw
37 pages
Full Problems and Solutions in Introductory and Advanced Matrix Calculus 2nd Edition Willi-Hans Steeb PDF All Chapters
No ratings yet
Full Problems and Solutions in Introductory and Advanced Matrix Calculus 2nd Edition Willi-Hans Steeb PDF All Chapters
55 pages
The Relationship Between Market Sentiment and Stoc
No ratings yet
The Relationship Between Market Sentiment and Stoc
5 pages
Miller Indices
No ratings yet
Miller Indices
18 pages
Canadian Open Mathematics Challenge: The Canadian Mathematical Society
No ratings yet
Canadian Open Mathematics Challenge: The Canadian Mathematical Society
13 pages
Lesson Plan in Mathematics 09 Demo Final
No ratings yet
Lesson Plan in Mathematics 09 Demo Final
6 pages
Budget of Work: Mathematics 6 First Quarter
No ratings yet
Budget of Work: Mathematics 6 First Quarter
8 pages
MBA SEM2 Macroeconomics QA
No ratings yet
MBA SEM2 Macroeconomics QA
2 pages
Course Recommendations For Linked in Learning
No ratings yet
Course Recommendations For Linked in Learning
3 pages
ChE 310 - Mid Quiz Question - ChE 16
No ratings yet
ChE 310 - Mid Quiz Question - ChE 16
3 pages
E0 - 270 (On-Campus) - Practice Set
No ratings yet
E0 - 270 (On-Campus) - Practice Set
2 pages
Topological Transitivity
No ratings yet
Topological Transitivity
10 pages
AEM 3e Chapter 13
No ratings yet
AEM 3e Chapter 13
24 pages
CV 11 1744611868063
No ratings yet
CV 11 1744611868063
1 page
Inductors: Publishing As Pearson (Imprint) Boylestad
No ratings yet
Inductors: Publishing As Pearson (Imprint) Boylestad
62 pages
ECMC49F Midterm Solution 2
No ratings yet
ECMC49F Midterm Solution 2
13 pages
Video Rental Inventory System
No ratings yet
Video Rental Inventory System
4 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet