0% found this document useful (0 votes)
6 views

Data Analytics Using R (1)

The document outlines the advantages of using R for data analytics, highlighting its statistical power, visualization capabilities, extensive libraries, and open-source nature. It details a workflow for data analytics using R, covering data collection, preprocessing, exploratory data analysis, statistical analysis, visualization, machine learning, and reporting. Additionally, it discusses the advantages and limitations of R, along with its applications in various fields such as business, healthcare, finance, and social media.

Uploaded by

aravioffl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Data Analytics Using R (1)

The document outlines the advantages of using R for data analytics, highlighting its statistical power, visualization capabilities, extensive libraries, and open-source nature. It details a workflow for data analytics using R, covering data collection, preprocessing, exploratory data analysis, statistical analysis, visualization, machine learning, and reporting. Additionally, it discusses the advantages and limitations of R, along with its applications in various fields such as business, healthcare, finance, and social media.

Uploaded by

aravioffl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Analytics Using R

Why Use R for Data Analytics?

1.​ Statistical Power: R is equipped with advanced statistical


techniques for data analysis.
2.​ Visualization: Offers powerful tools like ggplot2 for creating
elegant and insightful data visualizations.
3.​ Extensive Libraries: Thousands of packages (CRAN,
Bioconductor) extend R’s functionality.
4.​ Open Source: Freely available and supported by a large
community of developers and users.
5.​ Integration: Works well with databases, spreadsheets, and other
programming languages.

Workflow of Data Analytics Using R

1.​ Data Collection:​

○​ Import data from various sources like CSV, Excel, databases,


or web APIs.
○​ Common functions:
■​ read.csv(): For CSV files.
■​ read_excel(): For Excel files (using the readxl
package).
■​ DBI or RODBC: For database connections.
2.​ Data Preprocessing:​

○​ Cleaning and transforming data for analysis.


○​ Key operations:
■​ Handling missing values: na.omit() or impute().
■​ Data transformation: mutate() from the dplyr
package.
■​ Filtering: filter() from dplyr.
■​ Reshaping data: tidyr::spread() and
tidyr::gather().
3.​ Exploratory Data Analysis (EDA):​

○​ Understanding the dataset through summary statistics and


visualization.
○​ Key functions:
■​ summary(): Provides statistical summaries of data.
■​ Visualization: hist(), boxplot(), plot(), or
ggplot2.
4.​ Statistical Analysis:​

○​ Perform hypothesis testing, regression analysis, or


time-series forecasting.
○​ Common techniques:
■​ T-tests: t.test().
■​ Linear Regression: lm().
■​ ANOVA: aov().
5.​ Visualization:​

○​ Use R’s rich visualization libraries to create graphs and


charts.
○​ Popular libraries:
■​ ggplot2: For custom, layered visualizations.
■​ plotly: For interactive plots.
■​ shiny: For web-based dashboards.
6.​ Machine Learning:​

○​ R supports machine learning algorithms for classification,


regression, clustering, etc.
○​ Popular packages:
■​ caret: For ML workflows.
■​ randomForest: For Random Forest models.
■​ e1071: For SVM and Naive Bayes.
7.​ Reporting:​

○​ Create reports or dashboards to present findings.


○​ Tools like R Markdown and Shiny enable creating
interactive or static reports.

Example: Data Analytics in R

Dataset:

The "mtcars" dataset, included in R, contains information about cars.


Steps:

1.​ Load the Dataset:​

Unset

data(mtcars)
head(mtcars)
2.​
Summary Statistics:​

Unset

summary(mtcars)

3.​
Filter Data: Select cars with MPG greater than 20.​

Unset

library(dplyr)
filtered_data <- filter(mtcars, mpg > 20)
head(filtered_data)

4.​
Visualization: Create a scatter plot of horsepower vs. MPG.​

Unset

library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
ggtitle("Horsepower vs. MPG")

5.​
Linear Regression: Build a regression model to predict MPG.​
Unset

model <- lm(mpg ~ hp + wt, data = mtcars)


summary(model)

6.​
Save Results: Export processed data to a CSV file.​

Unset

write.csv(filtered_data, "filtered_data.csv")

Advantages of Using R

1.​ Comprehensive statistical analysis and visualization capabilities.


2.​ Access to cutting-edge techniques through community-contributed
packages.
3.​ Strong support for data manipulation and transformation.
4.​ Free and open-source, making it accessible for learning and
professional use.

Limitations of R

1.​ Steeper learning curve compared to other analytics tools.


2.​ Slower performance with extremely large datasets compared to
Python.
3.​ Limited integration with web development workflows compared to
other languages.
Applications of R in Data Analytics

1.​ Business Analytics: Customer segmentation, sales forecasting.


2.​ Healthcare: Survival analysis, predictive modeling for patient
outcomes.
3.​ Finance: Portfolio optimization, fraud detection.
4.​ Social Media: Sentiment analysis, trend analysis.

You might also like