0% found this document useful (0 votes)

190 views6 pages

R Script Module 3

This document contains examples of analyzing credit scoring and employee attrition datasets using R. For the credit scoring data, it performs correlation, regression and visualization analyses to understand relationships between variables like balance, income and credit rating. For the employee data, it uses logistic regression to identify key drivers of attrition, and visualizes how variables like time spent at work and satisfaction relate to attrition rates. The document also includes two addendums that provide more rigorous approaches to the bubble charts created for time spent and satisfaction variables.

Uploaded by

Vaish Navi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

190 views6 pages

R Script Module 3

Uploaded by

Vaish Navi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

############################################################

# Foundation to Strategic Business Analytics #

# Module 3 - Understanding causes and consequences #

# #

# Author: Nicolas Glady & Pauline Glikman #

# ESSEC BUSINESS SCHOOL #

############################################################

# Disclaimer: this script is used to produce the examples #

# presented during the course Strategic Business #

# Analytics. The author is not responsible in any way #

# for any problem encountered during this code execution. #

############################################################

#### EXAMPLE N°1 - CREDIT SCORING ####

############################################################

# Set your directory to the folder where you have downloaded the Credit Scoring
dataset

# To clean up the memory of your current R session run the following line

rm(list=ls(all=TRUE))

# Let's load our dataset and call it data

data=read.table('DATA_3.01_CREDIT.csv',sep=',',header=TRUE) # The function

read.table enables us to read flat files such as .csv files
# Now let's have a look at our variables and see some summary statistics

str(data) # The str() function shows the structure of your dataset and details the
type of variables that it contains

summary(data) # The summary() function provides for each variable in your dataset
the minimum, mean, maximum and quartiles

hist(data$Rating) # Produce a histogram of the credit scores

cor(data[,c(1:5,10)]) # Compute the correlation between all the numerical variables

of the sample

linreg=lm(Rating~.,data=data) # Estimate a linear regression model of Rating as a

function of everything else.

cor(linreg$fitted.values,data$Rating) # Computes the correlation between the fitted

values and the actual ones

plot(data$Rating,linreg$fitted.values) # Plot the fitted values vs. the actual ones

summary(linreg) # Reports the results of the regression

plot(data$Balance,data$Rating) # Allows to visualize the relationship between

Balance and Rating

plot(data$Income,data$Rating) # Allows to visualize the relationship between Income

and Rating

############################################################

#### EXAMPLE N°2 - HR ANALYTICS 2 ####

############################################################

# Set your directory to the folder where you have downloaded the HR Analytics 2
dataset

# To clean up the memory of your current R session run the following line

rm(list=ls(all=TRUE))
# Let's load our dataset and call it datatot

datatot=read.table('DATA_3.02_HR2.csv', header = T,sep=',')

# Now let's have a look at our variables and see some summary statistics

str(datatot) # The str() function shows the structure of your dataset and details
the type of variables that it contains

summary(datatot) # The summary() function provides for each variable in your

dataset the minimum, mean, maximum and quartiles

table(datatot$left) # look at the frequencies for the left variable

table(datatot$left)/nrow(datatot) # look at percentages for the left variable

hist(datatot$left) # alternatively, plot a histogram

cor(datatot) # Let's check out the correlations

logreg = glm(left ~ ., family=binomial(logit), data=datatot) # Estimate the drivers

of attrition

hist(logreg$fitted.values) # See the proportion of employee attrition according to

the model

cor(logreg$fitted.values,datatot$left) # Assess the correlation between estimated

attrition and actual

cutoff=.3 # Cutoff to determine when P[leaving] should be considered as a leaver or

not. Note you can play with it...

sum((logreg$fitted.values<=cutoff)&(datatot$left==0))/sum(datatot$left==0) #
Compute the percentage of correctly classified employees who stayed

sum((logreg$fitted.values>cutoff)&(datatot$left==1))/sum(datatot$left==1) # Compute
the percentage of correctly classified employees who left

mean((logreg$fitted.values>cutoff)==(datatot$left==1)) # Compute the overall

percentage of correctly classified employees
summary(logreg) # Report the results of the logistic regression

# Let's use a more visual way to see the effect of one of the most important
driver: TIC

plot(datatot$TIC,datatot$left,main= "Time and Employee Attrition",

ylab="Attrition", xlab= "Time spent")

# An aggregated plot

tempdata=datatot

aggbTimeRank=aggregate(left~ TIC, data=tempdata, FUN=mean) # We compute the average

attrition rate for each value of TIC

plot(aggbTimeRank$TIC,aggbTimeRank$left,main= "Time and Employee Attrition",

ylab="Average Attrition Rate", xlab= "Time spent")

# An even better one!

cntbTimeRank=aggregate(left~ TIC, data=tempdata, FUN=length) # We compute the

number of employees for each value of TIC

symbols(aggbTimeRank$TIC,aggbTimeRank$left,circles=cntbTimeRank$left, inches=.75,
fg="white", bg="red",main= "Time and Employee Attrition", ylab="Average Attrition
Rate", xlab= "Time spent") # we

# (See Addendum A for a more rigorous approach)

# Let's use a more visual way to see the effect of the most important driver:
Satisfaction

tempdata=datatot

tempdata$rankSatis = round(rank(-tempdata$S)/600) # We create categories of

employee satisfaction ranking. We create 20 groups (because it will work well
later...)

aggbSatisRank = aggregate(left~ rankSatis, data=tempdata, FUN=mean) # We compute

the average attrition rate for each category

cntbSatisRank = aggregate(left~ rankSatis, data=tempdata, FUN=length) # We compute

the number of employees for each value of TIC

symbols(aggbSatisRank$rankSatis,aggbSatisRank$left,circles=cntbSatisRank$left,
inches=.2, fg="white", bg="red",main= "Satisfaction and Employee Attrition",
ylab="Average Attrition Rate", xlab= "Rank of Satisfaction")

# (See Addendum B for a more rigorous approach)

################################################################
## Addendum ##
## Contributed by Stefan Avey ##
## https://github.com/stefanavey/strategic-business-analytics ##
################################################################

## The Bubble Charts in the last 2 examples can be made more rigorously by
adjusting

## the area rather than the radius.

#######

## Addendum A ##

#######

## Human perceive area of shapes like circles. So if some value is twice as large,

## we want the area fo the circle to be twice as large, not the radius. The
symbols() function

## takes the radius of the circles by default, so we need to compute the radius in
order to end up with the desired size of circles.

size = cntbTimeRank$left

radius = sqrt(size / pi)

symbols(x = aggbTimeRank$TIC, y = aggbTimeRank$left,

circles = radius, inches = .75, fg = "white", bg = "red",

main = "Time and Employee Attrition",

ylab = "Average Attrition Rate", xlab = "Time spent")

#######

## Addendum B ##

#######

## Instead of creating roughly equal size groups of 20 by rank,

## we create 20 bins of equal size between 0 and 1 and assign each

## employee to 1 bin based on Satisfaction

bins = 20

breakPoints = seq(0, 1, length.out = (bins+1))

tempdata$rankSatis = (bins+1) - as.numeric(cut(tempdata$S, breaks = breakPoints))

## Visually, these are the bins we are choosing:

hist(tempdata$S, breaks = breakPoints)

abline(v = breakPoints, col = "red", lty = 2)

## Note that the first bin 0-0.05 has no employees (so the circle should be size 0)

aggbSatisRank = aggregate(left~ rankSatis, data=tempdata, FUN=mean) # We compute

the average attrition rate for each category

cntbSatisRank = aggregate(left~ rankSatis, data=tempdata, FUN=length) # We compute

the number of employees for each value of TIC

## Again here, we want to size the circles by their area, not radius

size <- cntbSatisRank$left

radius <- sqrt(size / pi)

symbols(x = aggbSatisRank$rankSatis, y = aggbSatisRank$left,

circles = radius, inches = 0.2, fg = "white", bg = "red",

main = "Satisfaction and Employee Attrition",

ylab = "Average Attrition Rate", xlab = "Rank of Satisfaction")

Shadow Teacher
100% (1)
Shadow Teacher
10 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Driving School Monitoring System
No ratings yet
Driving School Monitoring System
54 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Employee Analysis
No ratings yet
Employee Analysis
19 pages
07 HR
No ratings yet
07 HR
15 pages
Survival Analysis in R Tutorial 1688044180
No ratings yet
Survival Analysis in R Tutorial 1688044180
31 pages
1 Advanced Data Analysis-Course Outline
No ratings yet
1 Advanced Data Analysis-Course Outline
7 pages
Why's and Wherefore's
No ratings yet
Why's and Wherefore's
15 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
Regression Explained SPSS
No ratings yet
Regression Explained SPSS
25 pages
MKT4080-Codes
No ratings yet
MKT4080-Codes
9 pages
R for Marketing Research and Analytics
No ratings yet
R for Marketing Research and Analytics
47 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Finals-Predictive-Time-Series-Analysis - Module
No ratings yet
Finals-Predictive-Time-Series-Analysis - Module
14 pages
Pracal Labexamsamplequestions
No ratings yet
Pracal Labexamsamplequestions
35 pages
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
No ratings yet
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
4 pages
Stat Assignemt
No ratings yet
Stat Assignemt
3 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
An Introduction To The Psych Package: Part I: Data Entry and Data Description
No ratings yet
An Introduction To The Psych Package: Part I: Data Entry and Data Description
63 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Mod 3
No ratings yet
Mod 3
50 pages
Group Assignment - Data Mining
No ratings yet
Group Assignment - Data Mining
28 pages
Pratik Zanke Source Codes
No ratings yet
Pratik Zanke Source Codes
20 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Statsss 1
No ratings yet
Statsss 1
18 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
HACKATHON
No ratings yet
HACKATHON
8 pages
XSTK Project PDF
No ratings yet
XSTK Project PDF
26 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Big Data - Sources and Opportunities
No ratings yet
Big Data - Sources and Opportunities
30 pages
MGM3165 Chapter 9 10
No ratings yet
MGM3165 Chapter 9 10
44 pages
(Sb-t22324pwb-4) Group 2 - Group Assignment
No ratings yet
(Sb-t22324pwb-4) Group 2 - Group Assignment
21 pages
Data Analytics Final
No ratings yet
Data Analytics Final
329 pages
Introduction To Quantitative Analysis. Leonardo D. Villamil. HW2 09/26/2016
No ratings yet
Introduction To Quantitative Analysis. Leonardo D. Villamil. HW2 09/26/2016
7 pages
Financial Risk Analytics: Assignment
No ratings yet
Financial Risk Analytics: Assignment
35 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Exceltoopack
No ratings yet
Exceltoopack
7 pages
Name: Shaikh Siraj MD Azraf Course: Quantitative Techniques in Business Faculty: Safaat Ullah
No ratings yet
Name: Shaikh Siraj MD Azraf Course: Quantitative Techniques in Business Faculty: Safaat Ullah
17 pages
Logistic Regression Assignment
No ratings yet
Logistic Regression Assignment
20 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Da (22C01156)
No ratings yet
Da (22C01156)
26 pages
R Practicals
No ratings yet
R Practicals
32 pages
Lec 4
No ratings yet
Lec 4
18 pages
FINM4100_Analytics Methods Summary
No ratings yet
FINM4100_Analytics Methods Summary
36 pages
Videos and Tutorials On Data Analysis in The Psychometrics Lab
No ratings yet
Videos and Tutorials On Data Analysis in The Psychometrics Lab
13 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Ldap1 L3P1
No ratings yet
Ldap1 L3P1
19 pages
Data Analysis and Decision Making PDF
No ratings yet
Data Analysis and Decision Making PDF
97 pages
Research - Methods.reading Wpsdoc111111111111111111111111111
No ratings yet
Research - Methods.reading Wpsdoc111111111111111111111111111
14 pages
ECO 391 Lecture Slides - Part 2
No ratings yet
ECO 391 Lecture Slides - Part 2
26 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Introduction To Psych Package
No ratings yet
Introduction To Psych Package
65 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Acc 324 1st Topic
No ratings yet
Acc 324 1st Topic
16 pages
Regression in R
No ratings yet
Regression in R
40 pages
7.1 Regression Building Relationships
No ratings yet
7.1 Regression Building Relationships
44 pages
Regression Explained SPSS
100% (1)
Regression Explained SPSS
23 pages
Clase 2
No ratings yet
Clase 2
48 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Pricing Innovation in Retail Banking
No ratings yet
Pricing Innovation in Retail Banking
21 pages
Neural Network: A Computer System Modelled Based On The Human Brain and Nervous System
No ratings yet
Neural Network: A Computer System Modelled Based On The Human Brain and Nervous System
2 pages
Solved Paper 2013: Sponsored by
No ratings yet
Solved Paper 2013: Sponsored by
43 pages
Sticker Design Kit
No ratings yet
Sticker Design Kit
3 pages
Demographic Dividend
No ratings yet
Demographic Dividend
16 pages
Sub Centre Level Monitoring Checklist PHC Visit
100% (4)
Sub Centre Level Monitoring Checklist PHC Visit
24 pages
York University Dissertation Proposal
100% (2)
York University Dissertation Proposal
6 pages
Ib Themes
No ratings yet
Ib Themes
1 page
The Otpf 4
100% (1)
The Otpf 4
3 pages
Admit Card
No ratings yet
Admit Card
2 pages
M03 Longman English 4 TST Bulgaria at
No ratings yet
M03 Longman English 4 TST Bulgaria at
6 pages
Tugas Akhir Be
No ratings yet
Tugas Akhir Be
3 pages
Year 4 Quick Look Inside
No ratings yet
Year 4 Quick Look Inside
27 pages
Laufer 1990
No ratings yet
Laufer 1990
9 pages
(eBook PDF) Physics of Everyday Phenomena 9th Edition full
No ratings yet
(eBook PDF) Physics of Everyday Phenomena 9th Edition full
143 pages
Catelyn Klug: Education & Honors Profile
No ratings yet
Catelyn Klug: Education & Honors Profile
1 page
Dna Replication Notes
No ratings yet
Dna Replication Notes
11 pages
Readings in Philippine History Dr. Francisco Dan L. Salud JR
No ratings yet
Readings in Philippine History Dr. Francisco Dan L. Salud JR
43 pages
PlacementPdf Placement E4009c
No ratings yet
PlacementPdf Placement E4009c
2 pages
Kennetha Morris Dancer Learn To Dance - From Passion To Career
No ratings yet
Kennetha Morris Dancer Learn To Dance - From Passion To Career
4 pages
06 Maths ch11 Algebra Unlocked
No ratings yet
06 Maths ch11 Algebra Unlocked
2 pages
Format of The Thesis Proposal
No ratings yet
Format of The Thesis Proposal
2 pages
Psychology HL IB - Internal Assessment
No ratings yet
Psychology HL IB - Internal Assessment
19 pages
Cot 2nd Quarter
No ratings yet
Cot 2nd Quarter
2 pages
French B SL New Written Assignment Criteres
No ratings yet
French B SL New Written Assignment Criteres
1 page
A Critical Theory of Medical Discourse
No ratings yet
A Critical Theory of Medical Discourse
21 pages
Resume Jci Version Norm Christopherson No2
No ratings yet
Resume Jci Version Norm Christopherson No2
1 page
Thesis About Academic Performance
100% (3)
Thesis About Academic Performance
7 pages
Comparative Study of Diagrid Structures With and Without Corner Columns
No ratings yet
Comparative Study of Diagrid Structures With and Without Corner Columns
6 pages
Reading Readiness
No ratings yet
Reading Readiness
28 pages
AI in Astronomy
No ratings yet
AI in Astronomy
12 pages
Chapter 2 First Language Acquisition
No ratings yet
Chapter 2 First Language Acquisition
61 pages
Millennium Scholarship PDF
No ratings yet
Millennium Scholarship PDF
4 pages
Sustainability in The Education of Interior Designers in Egypt
No ratings yet
Sustainability in The Education of Interior Designers in Egypt
10 pages