0% found this document useful (0 votes)

2 views

LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS

The document provides an overview of data science, emphasizing its importance in decision-making and the growing demand for data scientists. It introduces R programming as a statistical language suitable for data analysis and visualization, highlighting the dplyr package for data manipulation and ggplot2 for data visualization. Additionally, it includes practical examples and questions related to data analysis using R.

Uploaded by

A.S. ROHIT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS

Uploaded by

A.S. ROHIT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Learning R Programming for Data Science Enthusiasts

What is Data Science ?

Data science is a field that focuses on making sense of data. It involves collecting
data, cleaning it up, and looking for patterns and insights.

This often includes using math and computer algorithms. Data scientists
also create visual charts and graphs to help explain what the data means.

How is Data Science Helpful ?

All of this helps businesses and organizations make better decisions and solve
problems.

Why Should You Learn Data Science ?

1. Demand
According to the BLS, Data Scientist employment is projected to grow by
36% from 2021 to 2031, far surpassing the average job growth rate. This high
demand makes data science a promising career choice.
2. Growth
The average salary for a Data Scientist is $1,43,970 in the United States.
3. Opportunities

4. Flexibility
Data Scientists are needed in various sectors. They are needed in Healthcare,
Financial Industry, Manufacturing, Logistics etc
R Programming

Why R Programming ?

Python is a general-purpose language that is used for the deployment

and development of various projects.

R is a statistical language used for the analysis and visual

representation of data.

Python is better suitable for machine learning, deep learning, and

large-scale web applications.

R is suitable for statistical learning having powerful libraries for data

experiment and exploration.

R is suitable for statistical learning having powerful libraries for data

experiment and exploration.

R’s statistical packages are highly powerful.

Installing R and RStudio

1. https://cran.r-project.org/bin/windows/base/
2. https://posit.co/download/rstudio-desktop/
DATA MANIPULATION USING DPLYR Package

The dplyr package is used to transform and summarize tabular data

with rows and columns (with rows and columns)

Works with dataframes or inbuilt datasets and converting to data

frames or data tables.

It is much faster and easier than the base R.

Things that can be done with DPLYR

1. Data Filtering

You can use filter() to select specific rows from a dataset based on
conditions, allowing you to focus on subsets of data that are relevant
to your analysis.

#installing a package
install.packages("dplyr")

# In R, the library() function is used to load packages into your R

session.
library(dplyr)

#installing inbuilt dataset

install.packages("nycflights13")

#to view the dataset

View(flights)
head(flights)

#subset data using filter()

#to filter out flights that operated in the month of july
July_flights <- filter(flights,month == 07)
View(July_flights)

#to filter out flights that operated in the month of july

July_flights_3 <- filter(flights,month == 07, day ==3)
View(July_flights_3)
head(flights[flights$month == 9 & flights$day == 3 & flights$origin ==
"LGA", ])

#to look at specific rows

slice(flights, 1:5)

#to create a new variable or add a new column

flights <- mutate(flights, overall_delay = arr_delay - dep_delay)

flights <- transmute(flights, overall_delay = arr_delay - dep_delay)

View(flights)

library(nycflights13)
flights <- nycflights13::flights

#summarize
summary_data <- summarise(flights, avg_air_time = mean(air_time,
na.rm = TRUE))
R <- summari

summary_data
summary_data <- summarise(flights, avg_air_time = sum(air_time,
na.rm = TRUE))
summary_data

summary_data <- summarise(flights, avg_air_time = sd(air_time, na.rm

= TRUE))
summary_data

Grouping Data

#summarizing by the gear

by_gear <- mtcars %>% group_by(gear)

#getting the average and sum of the groups created

a<-summarize(by_gear, gear1=sum(gear), gear2=mean(gear))
Sampling
sample_n(flights,15)

Arranging the DataSet

View(arrange(flights,year,dep_time))
View(arrange(flights,year,desc(dep_time)))

Nesting
df <-mtcars
result <-arrange(sample_n(filter(df,mpg>20), size = 5), desc(mpg))
result

result <- df %>% filter(mpg>20) %>% sample_n(size=10) %>% arrange

(desc(mpg))

result

Selecting the columns

Df_mpg_hp = df %>% select(mpg,hp,)
Df_mpg_hp
Questions:

1. How many passengers traveled in the Titanic?

2. How many passengers survived?
3. How many passengers did not survive?
4. What is the category of passengers that were the most?
5. What is the median age of passengers in the dataset?
6. How many male passengers are in the dataset?
7. How many female passengers are in the dataset?
8. How many passengers were in each class (1st, 2nd, 3rd)?
DATA VISUALIZATION USING ggplot2

library(ggplot2)

data()
?Bod

install.packages("ggplot2")
library(ggplot2)

Scatterplot
ggplot (BOD, aes(Time,demand)) + geom_point(size=3) +
geom_line(color = "red")

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) + geom_point()

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) +

geom_point(size = 3, alpha = 0.5)+ geom_smooth()

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) +

geom_point(size = 3, alpha = 0.5)+ geom_smooth(method=lm,se = F)+
labs(title = "Concentration of Co2")+theme_classic()

BoxPLot

CO2 %>% ggplot(aes(Treatment,uptake))+

geom_boxplot()

CO2 %>% ggplot(aes(Treatment,uptake))+

geom_boxplot()+geom_point()

CO2 %>% ggplot(aes(Treatment,uptake))+

geom_boxplot()+geom_point(aes(size =3, colour = Plant))
CO2 %>% ggplot(aes(Treatment,uptake))+
geom_boxplot()+geom_point(alpha=0.5,aes(size =3, colour = Plant))

CO2 %>% ggplot(aes(Treatment,uptake))+

geom_boxplot()+geom_point(alpha=0.5,aes(size =3, colour = Plant))+
coord_flip()+
theme_bw()+
labs(title = "Chilled vs Non-chilled")

View(mpg)

Questions using mtcars

1. Is there a relationship between a car's weight and its miles per
gallon (MPG)?

mtcars %>% ggplot(aes(wt,mpg)) + geom_point()

2. What is the distribution of car counts based on the number of

cylinders?

mtcars %>%
group_by(cyl) %>%
summarize(count = n()) %>%
ggplot(aes(x = as.factor(cyl), y = count)) +
geom_bar(stat = "identity")

3. What is the distribution of MPG among the cars in the dataset?

mtcars %>% ggplot(aes(mpg))+geom_histogram()

4. Does car speed (mph) change with engine displacement (cubic

inches)?

mtcars %>% ggplot(aes(disp,hp)) +geom_line()

Get Cross Examination Science and Techniques 2nd ed Edition Larry S. Pozner free all chapters
100% (1)
Get Cross Examination Science and Techniques 2nd ed Edition Larry S. Pozner free all chapters
61 pages
R For Health Data Science
No ratings yet
R For Health Data Science
365 pages
Modern Statistics With R
100% (2)
Modern Statistics With R
580 pages
Installation Instructions For WINXP INPA in VMware
0% (1)
Installation Instructions For WINXP INPA in VMware
6 pages
John Fox - Using The R Commander. A Point-And-Click Interface For R-CRC (2018)
No ratings yet
John Fox - Using The R Commander. A Point-And-Click Interface For R-CRC (2018)
223 pages
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
0% (1)
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
125 pages
Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet
Mat 1512 (Calculus A)
100% (1)
Mat 1512 (Calculus A)
169 pages
Intro of Guest Speaker
100% (5)
Intro of Guest Speaker
3 pages
Libro Ingles ID 3 Profesores
No ratings yet
Libro Ingles ID 3 Profesores
192 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
No ratings yet
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
11 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
14 pages
R Packages Dplyr Sem-III 2021
No ratings yet
R Packages Dplyr Sem-III 2021
13 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Exdata
No ratings yet
Exdata
184 pages
Complete Download Modern Statistics with R Måns Thulin PDF All Chapters
100% (2)
Complete Download Modern Statistics with R Måns Thulin PDF All Chapters
50 pages
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
No ratings yet
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
125 pages
Exploratory Data Analysis With R PDF
No ratings yet
Exploratory Data Analysis With R PDF
125 pages
Exploratory Data Analysis With R-Leanpub PDF
No ratings yet
Exploratory Data Analysis With R-Leanpub PDF
125 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Modern Statistics with R Måns Thulin - The latest ebook is available, download it today
No ratings yet
Modern Statistics with R Måns Thulin - The latest ebook is available, download it today
76 pages
14 Work With Big Data
No ratings yet
14 Work With Big Data
74 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Lab4_Instructions
No ratings yet
Lab4_Instructions
52 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
Module IV
No ratings yet
Module IV
43 pages
R for Health Data Science 1st Edition Ewen Harrison 2024 Scribd Download
100% (3)
R for Health Data Science 1st Edition Ewen Harrison 2024 Scribd Download
65 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
Contents
No ratings yet
Contents
17 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
(Ebook) R for Health Data Science by Ewen Harrison, Pius Riinu ISBN 9780367428327, 0367428326 - The ebook in PDF and DOCX formats is ready for download now
100% (2)
(Ebook) R for Health Data Science by Ewen Harrison, Pius Riinu ISBN 9780367428327, 0367428326 - The ebook in PDF and DOCX formats is ready for download now
71 pages
Peng Análisis Exploratorio R
No ratings yet
Peng Análisis Exploratorio R
198 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
Rbook PDF
No ratings yet
Rbook PDF
360 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Buy ebook Data Science Fundamentals with R Python and Open Data 1st Edition Marco Cremonini cheap price
100% (18)
Buy ebook Data Science Fundamentals with R Python and Open Data 1st Edition Marco Cremonini cheap price
71 pages
R for Health Data Science 1st Edition Ewen Harrison - The full ebook with all chapters is available for download
100% (4)
R for Health Data Science 1st Edition Ewen Harrison - The full ebook with all chapters is available for download
74 pages
R Language PDF
100% (1)
R Language PDF
619 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Rintro
No ratings yet
Rintro
42 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
R Intro Long
No ratings yet
R Intro Long
156 pages
DV Lab
No ratings yet
DV Lab
52 pages
Coursera Notes
No ratings yet
Coursera Notes
4 pages
R For Statistics PDF
88% (8)
R For Statistics PDF
312 pages
Ida PDF
No ratings yet
Ida PDF
62 pages
Data Analytics Lesson 10 Notes
No ratings yet
Data Analytics Lesson 10 Notes
7 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Manual C1900 Control
No ratings yet
Manual C1900 Control
60 pages
PF2 GM Screen v0.5
No ratings yet
PF2 GM Screen v0.5
4 pages
Design Guidelines: Direct Metal Laser Sintering (DMLS) : 3D Printing & Advanced Manufacturing
No ratings yet
Design Guidelines: Direct Metal Laser Sintering (DMLS) : 3D Printing & Advanced Manufacturing
6 pages
Mestrado AnaManzaro Defesa
0% (1)
Mestrado AnaManzaro Defesa
116 pages
Homework 3
0% (1)
Homework 3
2 pages
GPA 2145-09 Migration Guide
No ratings yet
GPA 2145-09 Migration Guide
13 pages
Tutorial: System Structure Modeling Using Magicdraw: Vlad Acretoaie Technical University of Denmark Courses 02341, 02264
No ratings yet
Tutorial: System Structure Modeling Using Magicdraw: Vlad Acretoaie Technical University of Denmark Courses 02341, 02264
30 pages
Working With Unix
No ratings yet
Working With Unix
212 pages
Better Usability: Auto Hematology Analyzer
100% (1)
Better Usability: Auto Hematology Analyzer
2 pages
Marker Making Preparation and Its Factors
100% (1)
Marker Making Preparation and Its Factors
10 pages
Sabine Site Analysis
No ratings yet
Sabine Site Analysis
24 pages
9093 s14 Ms 13
No ratings yet
9093 s14 Ms 13
3 pages
Greedy Combinatorial Test Case Generation Using Unsatisfiable Cores
No ratings yet
Greedy Combinatorial Test Case Generation Using Unsatisfiable Cores
11 pages
How Do Scientists Date Fossils
No ratings yet
How Do Scientists Date Fossils
6 pages
HW CH5
No ratings yet
HW CH5
3 pages
Upaya Revitalisasi Pertanian Rumput Laut Dalam Praktik Pariwisata Di Desa Lembongan, Kabupaten Klungkung
No ratings yet
Upaya Revitalisasi Pertanian Rumput Laut Dalam Praktik Pariwisata Di Desa Lembongan, Kabupaten Klungkung
12 pages
IGCSE 0523 2024 Ms 01
No ratings yet
IGCSE 0523 2024 Ms 01
12 pages
2nd Quarter Test - Physical Science
No ratings yet
2nd Quarter Test - Physical Science
4 pages
Ascension Vol 4 by Daniel Scranton
100% (1)
Ascension Vol 4 by Daniel Scranton
313 pages
Research Challenges in 5G Cellular Systems: Prof. Dr. I. F. AKYILDIZ
No ratings yet
Research Challenges in 5G Cellular Systems: Prof. Dr. I. F. AKYILDIZ
2 pages
Performance Appraisal Sheet: (Non-Officer Level)
No ratings yet
Performance Appraisal Sheet: (Non-Officer Level)
10 pages
Exerting Influence Without Authority - HBR
100% (1)
Exerting Influence Without Authority - HBR
7 pages
Me6602 Automobile Engineering Notes PDF Lecture Notes On Automobile Engineering PDF Me6602 Automobile Engineering Book PDF
No ratings yet
Me6602 Automobile Engineering Notes PDF Lecture Notes On Automobile Engineering PDF Me6602 Automobile Engineering Book PDF
6 pages
Vas 5052 Vehicle Diagnostic Tool Manual
100% (2)
Vas 5052 Vehicle Diagnostic Tool Manual
32 pages

LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS

Uploaded by

LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS

Uploaded by

Learning R Programming for Data Science Enthusiasts

What is Data Science ?

How is Data Science Helpful ?

Why Should You Learn Data Science ?

Python is a general-purpose language that is used for the deployment

R is a statistical language used for the analysis and visual

Python is better suitable for machine learning, deep learning, and

R is suitable for statistical learning having powerful libraries for data

R is suitable for statistical learning having powerful libraries for data

R’s statistical packages are highly powerful.

Installing R and RStudio

The dplyr package is used to transform and summarize tabular data

Works with dataframes or inbuilt datasets and converting to data

It is much faster and easier than the base R.

Things that can be done with DPLYR

# In R, the library() function is used to load packages into your R

#installing inbuilt dataset

#to view the dataset

#subset data using filter()

#to filter out flights that operated in the month of july

#to look at specific rows

#to create a new variable or add a new column

flights <- transmute(flights, overall_delay = arr_delay - dep_delay)

summary_data <- summarise(flights, avg_air_time = sd(air_time, na.rm

#summarizing by the gear

#getting the average and sum of the groups created

Arranging the DataSet

result <- df %>% filter(mpg>20) %>% sample_n(size=10) %>% arrange

Selecting the columns

1. How many passengers traveled in the Titanic?

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) + geom_point()

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) +

CO2 %>% ggplot(aes(conc,uptake, colour = Treatment)) +

CO2 %>% ggplot(aes(Treatment,uptake))+

CO2 %>% ggplot(aes(Treatment,uptake))+

CO2 %>% ggplot(aes(Treatment,uptake))+

CO2 %>% ggplot(aes(Treatment,uptake))+

Questions using mtcars

mtcars %>% ggplot(aes(wt,mpg)) + geom_point()

2. What is the distribution of car counts based on the number of

3. What is the distribution of MPG among the cars in the dataset?

mtcars %>% ggplot(aes(mpg))+geom_histogram()

4. Does car speed (mph) change with engine displacement (cubic

mtcars %>% ggplot(aes(disp,hp)) +geom_line()

You might also like