Module 7_(Data Analysis with R Programming)
Module 7_(Data Analysis with R Programming)
Week -1
The exciting world of programming
Additional Readings
● https://medium.com/analytics-and-data/r-vs-python-a-comprehensive-
guide-for-data-professionals-321e8dead598
● https://www.dataquest.io/blog/python-vs-r/
● https://blog.rstudio.com/2019/12/17/r-vs-python-what-s-the-best-fo
r-language-for-data-science/
Programming as a data analyst
R Packages
Palmer penguins
● https://allisonhorst.github.io/palmerpenguins/ To view: View(penguins)
Tidyverse
● https://www.tidyverse.org/
Week-2
Programming fundamentals
Case sensitive
The basic concepts of R
● functions,
● comments,
● variables,
● data types,
● vectors, and
● pipes.
PRINT COMMAND Variables
- print() - # - a<- ”Dhamu”
- ?print() - b<- 10
Vector
vector is a group of data elements of the same type stored in a sequence in
R.
Eg: z<- c(23,45,67)
Pipe
A pipe is a tool in R for expressing a sequence of multiple operations.
Represented by %>%
Vectors and lists in R
Some commands
● typeof(a)
● is.integer(a)
Explore coding in R
Conditional statements
● if()
● else()
● else if()
Available R packages
Choosing the right packages
● https://www.tidyverse.org/
● https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-us
eful-R-packages
● https://cran.r-project.org/web/views/
● install.packages("tidyverse")
● library(ggplot2)
● View(diamonds)
● head(diamonds), glimpse()
● str(diamonds) -some info included column names, data type etc.
● colnames(diamonds) -This is only for column names.
● mutate() -Make changes in the dataframe.
Highlighted are summary functions
Data-import basics
Import data. and readxl Package
Refer to the pdf named as “ M7_W3_Data-import basics.pdf “
Cleaning data
File-naming conventions
Give easily understandable file name with underscores
● https://speakerdeck.com/jennybc/how-to-name-files
● https://libguides.princeton.edu/c.php?g=102546&p=930626#:~:text=File%2
0naming%20best%20practices%3A&text=File%20names%20should%20be%2
0short,date%20format%20ISO%208601%3A%20YYYYMMDD
More on R operators
In R, there are four main types of operators:
1. Arithmetic
2. Relational
3. Logical
4. Assignment
Arithmetic operator
Relational operators
Logical operators
Assignment operators
Aesthetic attributes
There are three aesthetic attributes in ggplot2:
● Color: this allows you to change the color of all of the points on your
plot, or the color of each data group
● Size: this allows you to change the size of the points on your plot by
data group
● Shape: this allows you to change the shape of the points on your plot
by data group
Additional resources
● https://ggplot2.tidyverse.org/
● http://statseducation.com/Introduction-to-R/modules/graphics/aesthetics/
● https://www.rdocumentation.org/packages/ggplot2/versions/3.3.3/topics/aes
Smoothing
Smoothing enables the detection of a data trend even when you can't easily
notice a trend from the plotted data points.
Loess smoothing
The loess smoothing process is best for smoothing plots with less than 1000
points.
Gam smoothing
R Markdown resources
R Markdown documentation
● https://rmarkdown.rstudio.com/lesson-1.html
R Markdown reference materials
● https://rmarkdown.rstudio.com/lesson-15.html
● https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf?
_ga=2.49295910.1034302809.1602760608-739985330.1601281773
Week-2
● Basic concepts
● R Packages
Week-3
● Data frame
● Cleaning data
● Checking for biasing
Week-4
● ggplot()
● Save plotted images
Week-5
● Jupyter notebook
● R Markdown notebook
Dhamodharan
30/10/2021