Course Title: Introduction To R in Business Applications
Course Title: Introduction To R in Business Applications
Business Applications
Ram Mohan Dhara|
IMTG/ PGDM/ Term-V / 2019-2021
Session 3 : Intermediate R / Apply functions
Split your screen – one for Your Profs are rationally
log-in and other for hands- bounded! Q&A session only
on practice in last 15 mins…
1. State – 50 states of US
2. Population - population estimate as of July 1, 1975
3. Income - per capita income (1974)
4. Illiteracy - illiteracy (1970, percent of population)
5. Life Exp: life expectancy in years (1969–71)
6. HS Grad: percent high-school graduates (1970)
7. Frost: mean number of days with min temp below freezing point in capital or large city
8. Area: land area in square miles
9. Crime: rate per 100,000 population (1976)
Case Example - Iris flower
• The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the
British statistician and biologist Ronald Fisher in his 1936 paper on linear discriminant
analysis.
• The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris
virginica and Iris versicolor).
• Four features were measured from each sample: the length and the width of the sepals
and petals, in centimetres.
• Based on the combination of these four features, Fisher developed a linear discriminant
model to distinguish the species from each other.
• This data set became a typical test case for many statistical classification techniques in
machine learning.
Parts of a flower
Iris Setosa Iris Versicolor Iris Virginica
REF : https://www.learnbyexample.org/
lapply()
REF : https://www.learnbyexample.org/
sapply()
REF : https://www.learnbyexample.org/
tapply()
• xxx
REF : https://www.learnbyexample.org/
dplyr package in R
1. select()- used to select cols of a data frame for viewing
2. filter() - used to filter a subset from a data frame; filtering can be done using multiple
conditions.
3. arrange ()- used to arrange the rows of a data frame according to some other variable/ column
say, by date
4. rename () - used to rename the variables
5. mutate ()- used to add new variables in the dataset
6. sample () - used to select random rows from a data frame
7. count ()- used to count the no of rows at the levels of a factor; similar to table() function
8. group_by () - used to group data by one or more variables
9. summarise () - used to summarise the variables. most powerful function for EDA.
Summary : what we have learnt
• How to write programs more efficiently in R using -
1. apply()
2. lapply()
3. sapply()
4. tapply()
5. mapply()
• How to manage a data frame using functions of
dplyr package
This concludes the session :
Introduction to R