0% found this document useful (0 votes)

15 views10 pages

Data Analysis in R

The document provides an overview of data analysis functions in R, including built-in functions for summarizing, manipulating, and visualizing data. It covers common functions such as summary(), mean(), sd(), and t.test(), as well as data manipulation functions from the dplyr package and visualization functions from base R and ggplot2. Additionally, it discusses inferential statistics functions and their applications in R.

Uploaded by

lsivakum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Data Analysis in R

Uploaded by

lsivakum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Introduction to Data Analysis Functions in R

R is a powerful programming language widely used for statistical computing and data analysis. It
provides numerous built-in functions that simplify the process of analyzing data. These functions can
perform various tasks, including summarizing data, manipulating datasets, and visualizing results.

Common Data Analysis Functions in R

1. summary()

 Description: This function provides a summary of the main statistical measures for
each column in a dataset.

 Syntax: summary(object)

 Example:

 df <- data.frame(

 age = c(25, 30, 35, 40),

 salary = c(50000, 60000, 70000, 80000)

 )

summary(df)

 Output:

 age salary

 Min. :25.0 Min. :50000

 1st Qu.:28.75 1st Qu.:57500

 Median :32.5 Median :65000

 Mean :32.5 Mean :66250

 3rd Qu.:36.25 3rd Qu.:72500

Max. :40.0 Max. :80000

2. mean()

 Description: This function calculates the average of a numeric vector.

 Syntax: mean(x, na.rm = FALSE)

 Example:

 salaries <- c(50000, 60000, NA, 80000)

mean(salaries, na.rm = TRUE) # Excludes NA values

 Output:

[1] 63333.33

3. sd()
 Description: This function computes the standard deviation of a numeric vector.

 Syntax: sd(x, na.rm = FALSE)

 Example:

sd(salaries, na.rm = TRUE) # Excludes NA values

 Output:

[1] 12990.38

4. t.test()

 Description: This function performs a t-test to compare means between two groups.

 Syntax: t.test(x, y = NULL, alternative = "two.sided", ...)

 Example:

 group1 <- c(50000, 60000)

 group2 <- c(70000, 80000)

t.test(group1, group2)

 Output (example):

 Welch Two Sample t-test

 data: group1 and group2

 t = -3.1623, df = 2.0008, p-value = 0.03656

 alternative hypothesis: true difference in means is not equal to 0

 95 percent confidence interval:

 -126666.7 -3333.3

 sample estimates:

 mean of x mean of y

55000 75000

5. plot()

 Description: This function creates a scatter plot or other types of plots based on the
input data.

 Syntax: plot(x, y)

 Example:

plot(df$age, df$salary)

6. lm()
 Description: This function fits linear models to the data.

 Syntax: lm(formula, data)

 Example:

 model <- lm(salary ~ age, data=df)

summary(model)

These functions are just a few examples of the many available in R for performing data analysis tasks
effectively.

Statistical Functions in R

R provides a wide array of statistical functions that facilitate data analysis. Below are some key
statistical functions along with their descriptions, syntax, and examples.

1. Mean Function

 Description: Calculates the average of a numeric vector.

 Syntax: mean(x, na.rm = FALSE)

 x: A numeric vector.

 na.rm: Logical value indicating whether to remove missing values (NA).

 Example:

 data <- c(2, 3, 5, NA, 7)

mean(data, na.rm = TRUE) # Output: 4.25

2. Standard Deviation Function

 Description: Computes the standard deviation of a numeric vector.

 Syntax: sd(x, na.rm = FALSE)

 x: A numeric vector.

 na.rm: Logical value indicating whether to remove missing values (NA).

 Example:

 data <- c(2, 3, 5, NA, 7)

sd(data, na.rm = TRUE) # Output: 2.08

3. Variance Function

 Description: Calculates the variance of a numeric vector.

 Syntax: var(x, na.rm = FALSE)

 x: A numeric vector.

 na.rm: Logical value indicating whether to remove missing values (NA).

 Example:

 data <- c(2, 3, 5, NA, 7)

var(data, na.rm = TRUE) # Output: 4.33

4. Median Function

 Description: Finds the median value of a numeric vector.

 Syntax: median(x, na.rm = FALSE)

 x: A numeric vector.

 na.rm: Logical value indicating whether to remove missing values (NA).

 Example:

 data <- c(2, 3, 5, NA, 7)

median(data, na.rm = TRUE) # Output: 5

5. Summary Function

 Description: Provides a summary of statistics for an object including minimums and

maximums.

 Syntax: summary(object)

 Example:

 data <- c(2, 3, 5, NA, 7)

 summary(data)

 # Output:

 # Min. :2.0

 # 1st Qu.:3.0

 # Median :5.0

 # Mean :4.25

 # 3rd Qu.:5.0

 # Max. :7.0

# NA's :1

6. T-Test Function

 Description: Performs one-sample or two-sample t-tests to compare means.

 Syntax: t.test(x) or for two samples: t.test(x ~ group)

 Example (One-Sample):

 data <- c(2, 3, 5)

 t.test(data)

# Output includes t-value and p-value for the test against mean=0

Example (Two-Sample):

group1 <- c(2, 3)

group2 <- c(5, 7)

t.test(group1 ~ group2)

# Output includes t-value and p-value comparing means of group1 and group2

7. Correlation Function

 Description: Computes the correlation coefficient between two variables.

 Syntax: cor(x,y,use="everything",method="pearson")

 Example:

 x <- c(1,2,3)

 y <- c(4,5,6)

cor(x,y) # Output: [1] 1 (perfect positive correlation)

These functions are fundamental for performing various statistical analyses in R.

1. Introduction to Inferential Statistics in R

Inferential statistics allows researchers to make conclusions about a population based on sample
data. In R, various functions are available for performing inferential statistical tests such as t-tests,
chi-squared tests, ANOVA, and regression analysis. Below are some commonly used inferential
functions along with their syntax and examples.

2. t-test Functions

The t-test is used to compare the means of two groups.

 Independent t-test

Syntax:

t.test(x, y = NULL, alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level =

0.95)

Example:

# Sample data

group1 <- c(5, 6, 7, 8)

group2 <- c(3, 4, 5, 6)

# Perform independent t-test

result <- t.test(group1, group2)

print(result)

 Paired t-test

Syntax:

t.test(x, y = NULL, alternative = "two.sided", mu = 0, paired = TRUE, var.equal = FALSE, conf.level =

0.95)

Example:

# Sample data before and after treatment

before <- c(78,65,71)

after <- c(71,62,70)

# Perform paired t-test

result <- t.test(before, after, paired=TRUE)

print(result)

3. Chi-Squared Test Function

The chi-squared test is used to determine if there is a significant association between categorical
variables.

 Chi-Squared Test

Syntax:

chisq.test(x)

Example:

# Sample contingency table

data <- matrix(c(10,20,30,40), nrow=2)

# Perform chi-squared test

result <- chisq.test(data)

print(result)

4. ANOVA Function

ANOVA (Analysis of Variance) is used when comparing means across three or more groups.

 ANOVA Function

Syntax:

aov(formula, data)
Example:

# Sample data frame

df <- data.frame(

group = rep(c("A", "B", "C"), each=10),

values = c(rnorm(10), rnorm(10), rnorm(10))

# Perform ANOVA

result <- aov(values ~ group, data=df)

summary(result)

5. Linear Regression Function

Linear regression is used to model the relationship between a dependent variable and one or more
independent variables.

 Linear Regression Function

Syntax:

lm(formula,data)

Example:

# Sample data frame for linear regression

df <- data.frame(

x = rnorm(100),

y = rnorm(100)

# Perform linear regression

model <- lm(y ~ x , data=df)

summary(model)

6. Conclusion

These functions provide a foundation for conducting inferential statistics in R. By utilizing these tools
effectively with appropriate datasets and hypotheses testing approaches can lead to meaningful
insights from your analyses.
Data Manipulation Functions in R

In R, the dplyr package is a powerful tool for data manipulation. It provides a variety of functions that
allow users to perform common data manipulation tasks efficiently and effectively. Below are some
of the key functions available in the dplyr package:

1. filter() The filter() function is used to subset rows based on specific conditions. You can use logical
operators to specify these conditions.

 Syntax: filter(dataframeName, condition)

 Example: To filter players who scored more than 100 runs:

filtered_data <- filter(stats, runs > 100)

2. distinct() The distinct() function removes duplicate rows from a data frame or based on specified
columns.

 Syntax: distinct(dataframeName, col1, col2,.., .keep_all=TRUE)

 Example: To remove duplicates based on player names:

unique_players <- distinct(stats, player)

3. arrange() The arrange() function orders the rows of a data frame based on one or more columns.

 Syntax: arrange(dataframeName, columnName)

 Example: To sort players by their runs in ascending order:

sorted_data <- arrange(stats, runs)

4. select() The select() function extracts specific columns from a data frame.

 Syntax: select(dataframeName, col1,col2,… )

 Example: To select only the player and wickets columns:

selected_columns <- select(stats, player, wickets)

5. rename() The rename() function changes the names of columns in a data frame.

 Syntax: rename(dataframeName, newName=oldName)

 Example: To rename ‘runs’ to ‘runs_scored’:

renamed_data <- rename(stats, runs_scored = runs)

6. mutate() & transmute() These functions are used to create new variables. The mutate() function
adds new variables while keeping existing ones; the transmute() function creates new variables but
drops existing ones.

 Syntax for mutate: mutate(dataframeName, newVariable=formula)

 Syntax for transmute: transmute(dataframeName, newVariable=formula)

 Example:

 data_with_avg <- mutate(stats, avg = runs / wickets) # keeps old variables

data_with_only_avg <- transmute(stats, avg = runs / wickets) # drops old variables

7. summarize() The summarize() function aggregates data using summary statistics like mean or
sum.

 Syntax: summarize(dataframeName, aggregate_function(columnName))

 Example:

summary_stats <- summarize(stats, total_runs = sum(runs), average_runs = mean(runs))

These functions can be combined using the pipe operator (%>%) to create complex data
manipulation workflows that are both readable and efficient.

Data Visualization Functions in R

Data visualization in R is facilitated through various functions and packages that allow users to create
a wide range of graphical representations of data. Below are some of the key functions and their
respective uses:

1. Base R Plotting Functions

 plot(): This is the most basic function for creating scatter plots. It can be used for plotting
two variables against each other.

plot(x, y)

 hist(): Used to create histograms, which display the distribution of a continuous variable.

hist(data$variable)

 barplot(): This function creates bar plots for categorical data.

barplot(table(data$categorical_variable))

 boxplot(): Generates box plots to visualize the distribution of a dataset based on summary
statistics (minimum, first quartile, median, third quartile, and maximum).

boxplot(data$variable ~ data$grouping_variable)

2. ggplot2 Package The ggplot2 package is one of the most popular libraries for data visualization in
R due to its flexibility and powerful capabilities.

 ggplot(): Initializes a ggplot object. It requires a dataset as an argument.

ggplot(data = dataset) + geom_point(mapping = aes(x = x_variable, y = y_variable))

 geom_point(): Adds points to a scatter plot.

 geom_line(): Creates line graphs by connecting points with lines.

 geom_bar(): Used for creating bar charts; it automatically counts occurrences unless
specified otherwise.

 geom_histogram(): Similar to hist() but integrates seamlessly into the ggplot framework.
 facet_wrap() and facet_grid(): These functions allow you to create multiple panels based on
one or more categorical variables, making it easier to compare subsets of your data.

ggplot(data) + geom_point(aes(x = x_variable, y = y_variable)) + facet_wrap(~ categorical_variable)

3. Lattice Package The lattice package provides another system for creating visualizations in R.

 xyplot(): For creating scatter plots with conditioning variables.

library(lattice)

xyplot(y ~ x | factor(variable), data = dataset)

 barchart(), histogram(), and densityplot() are also available within this package for specific
types of visualizations.

4. Other Useful Packages Several other packages enhance visualization capabilities in R:

 Plotly: For interactive plots that can be embedded in web applications.

library(plotly)

p <- ggplot(data, aes(x=x_variable, y=y_variable)) + geom_point()

ggplotly(p)

 Highcharter: Another package for creating interactive visualizations using Highcharts

JavaScript library.

In conclusion, R provides a rich ecosystem of functions and packages dedicated to data visualization.
Users can choose from base R plotting functions or leverage advanced libraries like ggplot2, lattice,
and others depending on their specific needs.

OCP - SQL&PL - SQL (Vol3)
100% (3)
OCP - SQL&PL - SQL (Vol3)
304 pages
Real Time Scenarioes
No ratings yet
Real Time Scenarioes
82 pages
Javascript Cheat Sheet: Beginner's Essential
No ratings yet
Javascript Cheat Sheet: Beginner's Essential
29 pages
2.1 Algo Past Papers Workbook by Inqilab Patel
No ratings yet
2.1 Algo Past Papers Workbook by Inqilab Patel
53 pages
Teaching PLC Chap3
No ratings yet
Teaching PLC Chap3
39 pages
Exam18 ICSE Sample Paper Computer Applications PDF
No ratings yet
Exam18 ICSE Sample Paper Computer Applications PDF
7 pages
Unit 5 - R and Data Analysis
No ratings yet
Unit 5 - R and Data Analysis
29 pages
Creating A Windows DLL With Visual Basic
100% (2)
Creating A Windows DLL With Visual Basic
12 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Parallel and Distributed Computing Architectures A PDF
No ratings yet
Parallel and Distributed Computing Architectures A PDF
286 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Module 3 R Data Science
No ratings yet
Module 3 R Data Science
158 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Bdo Co1 Session 4
No ratings yet
Bdo Co1 Session 4
43 pages
First Course On R
No ratings yet
First Course On R
26 pages
Logixpro Lab 1.A: Digital I/O Simulator
No ratings yet
Logixpro Lab 1.A: Digital I/O Simulator
6 pages
Advantages of R Programming Language:: Extensive Libraries
No ratings yet
Advantages of R Programming Language:: Extensive Libraries
34 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
R Module 5
No ratings yet
R Module 5
21 pages
TeslaSCADA IDE UserManual
No ratings yet
TeslaSCADA IDE UserManual
459 pages
08 Functions
No ratings yet
08 Functions
36 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
DSA1101 2019 Week1 Part2
No ratings yet
DSA1101 2019 Week1 Part2
38 pages
Possible Questions On R Programming and Metaverse
No ratings yet
Possible Questions On R Programming and Metaverse
20 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Course Presentation
No ratings yet
Course Presentation
236 pages
R Viva Ques
No ratings yet
R Viva Ques
24 pages
Teaching Notes of R
No ratings yet
Teaching Notes of R
78 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
Capital Gains
No ratings yet
Capital Gains
8 pages
R Console
No ratings yet
R Console
6 pages
R Questions With Solution
No ratings yet
R Questions With Solution
11 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Lec 4
No ratings yet
Lec 4
18 pages
R
No ratings yet
R
13 pages
R Programing
No ratings yet
R Programing
12 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
R Examples
No ratings yet
R Examples
56 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
R Manual
No ratings yet
R Manual
10 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
R Programming Built in Functions
No ratings yet
R Programming Built in Functions
8 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Functions List
No ratings yet
R Functions List
8 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
R Course
No ratings yet
R Course
7 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
BA Notes
No ratings yet
BA Notes
5 pages
DSC2608 - Assessment - 05 S1-2025
No ratings yet
DSC2608 - Assessment - 05 S1-2025
4 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
Exercise Sheet - Control Structures and Functions: Hint: You Can Use The Command Diag
No ratings yet
Exercise Sheet - Control Structures and Functions: Hint: You Can Use The Command Diag
4 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Introduction To Statistical Analysis in R
No ratings yet
Introduction To Statistical Analysis in R
2 pages
C++ Language: Keywords
No ratings yet
C++ Language: Keywords
34 pages
Statistical Functions and Regression in R
No ratings yet
Statistical Functions and Regression in R
5 pages
Week13 - LAQs - SWR
No ratings yet
Week13 - LAQs - SWR
2 pages
Techniques For Integrating Petri Nets and Object Oriented Concepts
No ratings yet
Techniques For Integrating Petri Nets and Object Oriented Concepts
19 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Language Processing Analysis of Source Program Synthesis of Target Program
No ratings yet
Language Processing Analysis of Source Program Synthesis of Target Program
13 pages
Ex7, 8 and 9 Dbms Lab
No ratings yet
Ex7, 8 and 9 Dbms Lab
7 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Unit - 1 Introduction To Database Management System
No ratings yet
Unit - 1 Introduction To Database Management System
40 pages
Bca 204
No ratings yet
Bca 204
2 pages
Loving Common Lisp, or The Savvy Programmer's Secret Weapon (Online Version) (4th Ed) (Mark Watson)
No ratings yet
Loving Common Lisp, or The Savvy Programmer's Secret Weapon (Online Version) (4th Ed) (Mark Watson)
96 pages
Humble PDF
No ratings yet
Humble PDF
8 pages
Verilog 설계언어 초급 - 20180807
No ratings yet
Verilog 설계언어 초급 - 20180807
142 pages
Chapter2 Python 100 MCQs With Answers
No ratings yet
Chapter2 Python 100 MCQs With Answers
18 pages
Python Program To Make A Simple Calculator: Def Return
No ratings yet
Python Program To Make A Simple Calculator: Def Return
3 pages
Everything You Always Wanted To Know About The Processing of Customer Exit Variables, But - SAP Blogs
No ratings yet
Everything You Always Wanted To Know About The Processing of Customer Exit Variables, But - SAP Blogs
23 pages
Advanced Data Structre - Docx 20
No ratings yet
Advanced Data Structre - Docx 20
2 pages
TYPO3 Frontend Rendering Process v1.5
No ratings yet
TYPO3 Frontend Rendering Process v1.5
11 pages
Explain The Role of Data Science With Python? Ans
No ratings yet
Explain The Role of Data Science With Python? Ans
2 pages
MODULE 2 Introduction To Construction Estimates
No ratings yet
MODULE 2 Introduction To Construction Estimates
4 pages
Sixthsem Mobileprogramming
No ratings yet
Sixthsem Mobileprogramming
4 pages
Rahul Thakur
No ratings yet
Rahul Thakur
1 page
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet