0% found this document useful (0 votes)
13 views10 pages

Muthayammal College of Arts and Science Rasipuram: Assignment No - 1

This document provides information about built-in functions, graphics, and data handling in R programming. It discusses various built-in mathematical, statistical, data manipulation and visualization functions in R. It also describes different types of graphs that can be created such as bar plots, pie charts, histograms, scatter plots and box plots. Further, it outlines various methods for inputting, accessing and indexing data in R including reading data from files, filtering, sorting and merging datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Muthayammal College of Arts and Science Rasipuram: Assignment No - 1

This document provides information about built-in functions, graphics, and data handling in R programming. It discusses various built-in mathematical, statistical, data manipulation and visualization functions in R. It also describes different types of graphs that can be created such as bar plots, pie charts, histograms, scatter plots and box plots. Further, it outlines various methods for inputting, accessing and indexing data in R including reading data from files, filtering, sorting and merging datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Muthayammal College of Arts And Science

Rasipuram
Assignment No - 1

Name : K.Haritha

Roll no : 21UST004

Department : III- B.Sc., Statistics

Subject : R Programming For Data Analysis

Date :

K.Haritha

Student Signature Staff Signature


R PROGRAMMING FOR DATA ANALYSIS
UNIT -1

1.BUILT IN FUNCTION

R is a popular statistical computing and data analysis programming language. The


built-in functions in R support data manipulation, summary statistics, filtering, and generating
random numbers. These built-in functions are broadly categorized into the following
categories based on the operations they perform:

 Mathematical Functions
 Statistical Probability Functions
 String Functions
 Other Statistical Functions

1. Mathematical Functions

sqrt(): Square root

abs(): Absolute value

log(): Natural logarithm

exp(): Exponential function

2. Statistical Functions

mean(): Mean (average)

median(): Median

sd(): Standard deviation

var(): Variance

3. Data Manipulation Functions

subset(): Subset of data

merge(): Merge datasets


rbind(): Combine data frames by rows

cbind(): Combine data frames by columns

4. Data Inspection Functions

head(): Display the first part of a data frame

tail(): Display the last part of a data frame

summary(): Summary statistics

5. Data Visualization Functions

plot(): Create scatterplots, line plots, etc.

hist(): Create histograms

boxplot(): Create boxplots

6. String Manipulation Functions

paste(): Conca…

8. File Handling Functions

read.csv(): Read data from a CSV file

write.csv(): Write data to a CSV file

load(): Load saved objects

9.Statistical Modeling Functions

1m(): Linear regression

glm(): Generalized linear models

t.test(): Perform t-test


2.GRAPHICS IN R
There are hundreds of charts and graphs present in R. For example, bar plot, box plot,
mosaic plot, dot chart, coplot, histogram, pie chart, scatter graph, etc.

Types of R – Charts

 Bar Plot or Bar Chart


 Pie Diagram or Pie Chart
 Histogram
 Scatter Plot
 Box Plot

Bar Plot or Bar Chart

Bar plot or Bar Chart in R is used to represent the values in data vector as height of
the bars. The data vector passed to the function is represented over y-axis of the graph. Bar
chart can behave like histogram by using table() function instead of data vector.

Syntax: barplot(data, xlab, ylab)

where

data is the data vector to be represented on y-axis

xlab is the label given to x-axis

ylab is the label given to y-axis

Pie Diagram or Pie Chart

Pie chart is a circular chart divided into different segments according to the ratio of
data provided. The total value of the pie is 100 and the segments tell the fraction of the whole
pie. It is another method to represent statistical data in graphical form and pie() function is
used to perform the same.

Syntax: pie(x, labels, col, main, radius)


where

x is data vector

labels shows names given to slices

col fills the color in the slices as given parameter

main shows title name of the pie chart

radius indicates radius of the pie chart. It can be between -1 to +1

Histogram

Histogram is a graphical representation used to create a graph with bars representing


the frequency of grouped data in vector. Histogram is same as bar chart but only difference
between them is histogram represents frequency of grouped data rather than data itself.

Syntax: hist(x, col, border, main, xlab, ylab)

where

x is data vector

col specifies the color of the bars to be filled

border specifies the color of border of bars

main specifies the title name of histogram

xlab specifies the x-axis label

ylab specifies the y-axis label

Scatter Plot

A Scatter plot is another type of graphical representation used to plot the points to
show relationship between two data vectors. One of the data vectors is represented on x-axis
and another on y-axis.

Syntax: plot(x, y, type, xlab, ylab, main)


Where

x is the data vector represented on x-axis

y is the data vector represented on y-axis

type specifies the type of plot to be drawn. For example, “l” for lines, “p” for points,
“s” for stair steps, etc.

xlab specifies the label for x-axis

ylab specifies the label for y-axis

main specifies the title name of the graph

Box Plot

Box plot shows how the data is distributed in the data vector. It represents five values
in the graph i.e., minimum, first quartile, second quartile(median), third quartile, the
maximum value of the data vector.

Syntax: boxplot(x, xlab, ylab, notch)

where

x specifies the data vector

xlab specifies the label for x-axis

ylab specifies the label for y-axis

notch, if TRUE then creates notch on both the sides of the box
3. DATA INPUTTING, DATA ACCESSING AND INDEXING

DATA INPUTTING METHOD

1. Entering Data Manually

You can create a data frame by entering data manually using the `data.frame()` function. For
example:

# Create a data frame with two columns

my_data <- data.frame(

Name = c("Alice", "Bob", "Charlie"),

Age = c(25, 30, 22)

2. Reading Data from a CSV File

If your data is in a CSV (Comma-Separated Values) file, you can use the `read.csv()` function
to import it into R. For example:

# Read data from a CSV file

my_data <- read.csv("data.csv")

3. Reading Data from Excel

To read data from an Excel file, you can use packages like `readxl` or `openxlsx`. For
example, with the `readxl` package

# Install and load the readxl package

install.packages("readxl")

library(readxl)

# Read data from an Excel file

my_data <- read_excel("data.xlsx")


4. Reading Data from a URL

You can read data directly from a URL using functions like `read.csv()` or other specific
functions based on the data format.

5. Generating Synthetic Data

R provides various functions to generate synthetic data, such as `rnorm()` for generating
random numbers following a normal distribution.

6. Using External Packages for Specific Data Sources

Depending on your data source, you may need to use specialized packages. For instance, the
`jsonlite` package can be used to read JSON data, and the `httr` package can help you retrieve data
from web APIs.

DATA ACCESSING

1. Indexing Data Frames

You can access specific rows and columns in a data frame using square brackets `[ ]`.

For example:

# Access the first row of a data frame

first_row <- my_data[1, ]

# Access the "Name" column

name_column <- my_data$Name

# Access multiple columns

selected_columns <- my_data[, c("Name", "Age")]


2. Filtering Data

You can filter rows based on specific conditions using logical indexing. For example:

# Filter rows where Age is greater than 25

older_than_25 <- my_data[my_data$Age > 25, ]

3. Subsetting Data

You can create subsets of your data based on criteria. For example:

# Create a sub set of data for people named "Alice"

alice_data <- subset(my_data, Name == "Alice")

4. Sorting Data

You can sort your data by one or more columns using the `order()` function. For example:

# Sort data by Age in ascending order

sorted_data <- my_data[order(my_data$Age), ]

5. Aggregating Data

You can compute summary statistics on your data using functions like `sum()`, `mean()`,
`median()`, etc., or use the `aggregate()` function for more complex aggregations.

6. Merging Data

You can merge two or more data frames together based on common columns using functions
like `merge()` or `dplyr` package functions like `left_join()`, `inner_join()`, etc.

7. Accessing Lists and Matrices

You can access elements in lists and matrices using indexing similar to data frames, but with
double square brackets `[[]]` for lists and single brackets `[]` for matrices.
INDEXING

Indexing in R refers to the process of accessing or retrieving specific elements from


data structures like vectors, matrices, or data frames. In R, indexing typically involves using
numerical or logical indices to pinpoint particular elements within a data structure.

For example:

• Vector Indexing

In a vector, you can access elements by their position using square brackets. For
instance, my_vector[3] retrieves the third element of the vector.

• Matrix Indexing

For matrices, you can use row and column indices to access specific elements. For
example, my_matrix[2, 4] retrieves the element in the second row and fourth column.

• Data Frame Indexing

Data frames allow indexing by both row and column, such as my_data_frame[1,
"Name"] to access the "Name" column of the first row.

You might also like