LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS
LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS
Data science is a field that focuses on making sense of data. It involves collecting
data, cleaning it up, and looking for patterns and insights.
This often includes using math and computer algorithms. Data scientists
also create visual charts and graphs to help explain what the data means.
All of this helps businesses and organizations make better decisions and solve
problems.
1. Demand
According to the BLS, Data Scientist employment is projected to grow by
36% from 2021 to 2031, far surpassing the average job growth rate. This high
demand makes data science a promising career choice.
2. Growth
The average salary for a Data Scientist is $1,43,970 in the United States.
3. Opportunities
4. Flexibility
Data Scientists are needed in various sectors. They are needed in Healthcare,
Financial Industry, Manufacturing, Logistics etc
R Programming
Why R Programming ?
1. https://cran.r-project.org/bin/windows/base/
2. https://posit.co/download/rstudio-desktop/
DATA MANIPULATION USING DPLYR Package
1. Data Filtering
You can use filter() to select specific rows from a dataset based on
conditions, allowing you to focus on subsets of data that are relevant
to your analysis.
#installing a package
install.packages("dplyr")
library(nycflights13)
flights <- nycflights13::flights
#summarize
summary_data <- summarise(flights, avg_air_time = mean(air_time,
na.rm = TRUE))
R <- summari
summary_data
summary_data <- summarise(flights, avg_air_time = sum(air_time,
na.rm = TRUE))
summary_data
Grouping Data
View(arrange(flights,year,dep_time))
View(arrange(flights,year,desc(dep_time)))
Nesting
df <-mtcars
result <-arrange(sample_n(filter(df,mpg>20), size = 5), desc(mpg))
result
result
library(ggplot2)
data()
?Bod
install.packages("ggplot2")
library(ggplot2)
Scatterplot
ggplot (BOD, aes(Time,demand)) + geom_point(size=3) +
geom_line(color = "red")
BoxPLot
View(mpg)
mtcars %>%
group_by(cyl) %>%
summarize(count = n()) %>%
ggplot(aes(x = as.factor(cyl), y = count)) +
geom_bar(stat = "identity")