DS-2, Week 1, Lecture
DS-2, Week 1, Lecture
1 LEARNING OBJECTIVES
1.1 Pre-requisite for this course
• A general understanding of computer and data systems.
• A basic understanding of how smartphones and other day-to-day life devices work.
2 DATA SCIENCE
A field of study combining expertise in subject domain, involving strong computing skills including pro-
gramming in computing languages such as R, python, C, Fortran etc., and knowledge & understanding of
statistical and mathematical principles to draw meaningful insights from a given dataset. Using data science
skills, a data scientist is expected to discover hidden patterns from raw data while utilizing a blend of various
statistical tools and techniques, developing unexplored algorithms, and machine learn- ing principles. Hence,
a data scientist is person with expertise in multi-disciplinary field comprising knowledge in mathematics,
statistics, knowledge in IT, and subject domain.
A data scientist must also possess some non-mathematical skills to understand the patterns in the data and
decipher the meaning of results in the light of proposed objectives. In general a data scientist over- looks four
‘A’s of the data – Data Architecture, Data Acquisition, Data Analysis, and Data Archiving. In a nutshell,
following are the skills a data scientist should posses to become successful.
• Must possess knowledge on the application domain
• Must possess strong communication skills to translate back and forth between the data analysis team
and the end-user
• Must have an eye to visualize larger picture of a complex system
• Must have good understanding about metadata (data/file that stores information about the storage
and details of the data)
• Must possess the skills to transform, summarize, and interpret the results to draw meaning inferences
• Must possess strong skills in data display and presentation
∗ Chirag Shah, A Hands-On Introduction to Data Science, Cambridge University Press, 2020
† Amity University Rajasthan (Jaipur), [email protected]
Figure 1: Data Science is a field at the intersection of various expertise and domains as shown above.
Question: On average, how much increase can we expect in weight with an increase of one cm in height?
A simple method is to compute the differences in height (191 − 147 = 44 cms) and weight (81 − 51 = 30
kgs), then divide the weight difference by the height difference, that is, 30/44, leading to 0.68. In other
words, we see that, on average, one cm of height difference leads to a difference of 0.68 kgs in weight.
height <- c(147, 149, 151, 153, 155, 157, 159, 163, 167, 171, 175, 179, 183, 187, 191)
weight <- c(51, 52, 53, 54, 55, 56, 57, 60, 63, 66, 69, 72, 75, 78, 81)
Plot the data and visualize the pattern, and analyze something meaningful from it.
plot(height,weight,las=1,
type="b",col="blue",lwd=2,
xaxs = "i", yaxs = "i",
xlim=c(145,195),ylim=c(50,85),
xlab="Height (cms)", ylab="Weight (kgs)")
axis(2, at = seq(50,85,5), las = 2)
axis(1, at = seq(145,195,5), las = 1)
Plot the data again, however, include the regression analysis with the given data for extracting menaingful
information.
# Plot the data points
plot(height,weight,las=1,
pch = 21, col = "darkred", bg = "darkred", lwd=2,
xaxs = "i", yaxs = "i",
xlim=c(145,195),ylim=c(50,85),
xlab="Height (cms)", ylab="Weight (kgs)", cex=1.4)
axis(2, at = seq(50,85,5), las = 2)
axis(1, at = seq(145,195,5), las = 1)
80
75
Weight (kgs)
70
65
60
55
50
145 150 155 160 165 170 175 180 185 190 195
Height (cms)
Question: What would you expect the weight to be of an Indian woman who is 145 cms tall?
wt = NULL
ht = 145
wt = 0.69*ht + (-52.37)
paste("The weight of the Indian woman having height of 145 cms is",wt,"(kgs)" )
## [1] "The weight of the Indian woman having height of 145 cms is 47.68 (kgs)"
Question: What would you expect the weight of someone who is 193 cms tall to be?
wt = NULL
ht = 193
wt = 0.69*ht + (-52.37)
paste("The weight of the Indian woman having height of 193 cms is",wt,"(kgs)" )
## [1] "The weight of the Indian woman having height of 193 cms is 80.8 (kgs)"
70
65
60
55
50
145 150 155 160 165 170 175 180 185 190 195
Height (cms)
85
y = 0.50x + −22.50
80
R2 = 1.000, p ≤ 0.000
75
Weight (kgs)
70
65
60
y = 0.75x + −62.25
55 R = 1.000, p ≤ 0.000
2
50
145 150 155 160 165 170 175 180 185 190 195
Height (cms)
Both the equations (previously calculated with single regression and recently calculated dual regressions)
show different results.
wt1 = NULL
ht1 = 145
# Equation 1
wt1 = 0.50*ht1 + (-22.50)
paste("The weight of the Indian woman having height of 145 cms is",wt1,"(kgs)" )
## [1] "The weight of the Indian woman having height of 145 cms is 50 (kgs)"
wt2 = NULL
ht2 = 193
# Equation 2
wt2 = 0.75*ht2 + (-62.25)
paste("The weight of the Indian woman having height of 193 cms is",wt2,"(kgs)" )
## [1] "The weight of the Indian woman having height of 193 cms is 82.5 (kgs)"