0% found this document useful (0 votes)
12 views9 pages

Mod3 Tables EPP

The document outlines various R functions for reading and writing different file formats including CSV, Excel, JSON, and XML, along with their purposes and examples. It also details string manipulation functions and data frame operations such as adding columns, removing missing values, and reshaping data. Additionally, it covers grouping functions for data analysis, emphasizing the application of these functions in data cleaning and transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Mod3 Tables EPP

The document outlines various R functions for reading and writing different file formats including CSV, Excel, JSON, and XML, along with their purposes and examples. It also details string manipulation functions and data frame operations such as adding columns, removing missing values, and reshaping data. Additionally, it covers grouping functions for data analysis, emphasizing the application of these functions in data cleaning and transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Topic Key Points Functions/Examples

Reading and Functions: read.table(), Reading CSV: deer_data <-


Writing CSV read.csv() read.table("file_path", header=TRUE,
Files fill=TRUE)

Arguments: header, fill, sep, Writing CSV: `write.csv(deer_data,


nrow, skip "F:/deer.csv", row.names = FALSE,
fileEncoding = "utf8")

Other Functions:
read.csv2(), read.delim()

Special Imports: na.strings


for different formats (SQL,
SAS, Excel)

Unstructured Purpose: Files without Reading: tempest <-


Files consistent formats (e.g., text readLines("F:/Tempest.txt")
files, logs)

Functions: readLines(), Writing: writeLines("This is a line.",


writeLines() "F:/story.txt")

XML and HTML Purpose: For hierarchical XML Parsing: r_options <-
Files data storage (e.g., RSS, xmlParse(xml_file)
SOAP)

Functions: xmlParse(),
htmlParse(), xmlTreeParse()

Packages: XML

JSON and YAML Purpose: JSON for Reading JSON: parsed_data <-
Files lightweight data exchange, fromJSON(data)
YAML for configuration files

Functions: fromJSON(), Reading YAML: parsed_yaml <-


toJSON(), yaml.load(), yaml.load(yaml_data)
as.yaml()

Packages: RJSONIO, yaml


Excel Files Purpose: For data analysis Reading Excel: logfile <-
and storage (e.g., .xls, .xlsx) read.xlsx2("F:/Log2015.xls", sheetIndex
= 1, startRow = 2)

Packages: xlsx Writing Excel: write.xlsx2()

SAS, SPSS, Purpose: For statistical data SAS: read.ssd() SPSS: read.spss()
MATLAB Files formats MATLAB: readMat()

Packages: foreign, R.matlab

Web Data Purpose: Import data from Direct URL Read: cancer_data <-
the web via APIs or URLs read.csv(cancer_url)

Functions: read.table(), Download: download.file(cancer_url,


read.csv(), download.file() "cancer.csv")

Databases Purpose: Connect to DBMS SQLite Example: conn <-


(e.g., SQLite, MySQL, dbConnect(driver, db_file)
PostgreSQL)

Packages: DBI, RSQLite, Query Execution: dbGetQuery(conn,


RMySQL, RODBC, RMongo "SQL QUERY")

Functions: dbConnect(), MongoDB: Use RMongo


dbGetQuery(),
dbListTables(),
dbReadTable(),
dbDisconnect()

Here's a concise summary of the key points from your detailed explanation on data cleaning,
transforming, reshaping, and grouping in R:

Here’s a refined and detailed breakdown of R functions for data cleaning, transforming,
reshaping, and grouping, presented in a structured tabular format:

MORE DETAILed

Here’s the fully refined table format with Function, Basic Definition, Purpose, Syntax,
Example, Output, and Application columns for better understanding:
Data Import and Export Functions in R

Function Basic Purpose Syntax Example Output Application


Definiti
on

read.csv() Reads a To import read.csv(file, data <- Data Used to


CSV file tabular header = TRUE, read.csv("data.cs frame analyze or
into a data from sep = ",") v", header = with process
data a CSV file. TRUE) CSV structured
frame. content data from
CSV files.

write.csv( Writes a To export write.csv(data, write.csv(data, CSV file Save


) data data to a file, row.names "output.csv", saved analysis
frame CSV file. = FALSE) row.names = in the results or
to a CSV FALSE) specifie transformed
file. d path data for
sharing or
storage.

readLines Reads To import readLines(con = lines <- Charact Useful for


() lines unstructur "file") readLines("story.t er processing
from a ed text xt") vector logs, scripts,
text file data. with or any text-
into a file based files.
charact content
er
vector.

writeLine Writes a To save writeLines(text, writeLines(c("Lin Text file Store


s() charact processed con = "file") e 1", "Line 2"), with formatted or
er text data "output.txt") written modified
vector to a file. lines text for
to a text documentati
file. on or data
exchange.

xmlParse( Parses To read xmlParse(file, ...) xml_data <- XML Extracting or


) XML and xmlParse("file.xm object modifying
files manipulat l") parsed data from
e into R XML formats
into R hierarchic such as RSS
objects. al XML feeds or
data. SOAP.

fromJSO Reads To import fromJSON(json_ parsed_data <- R list or Used in web


N() JSON data in file) fromJSON('{"key" data scraping or
data JSON : "value"}') frame handling
and format. APIs that
convert return
s it into JSON-
R formatted
objects data.
(like
lists or
data
frames).

toJSON() Convert To export toJSON(object, json_output <- JSON Useful for


sR data to pretty = TRUE) toJSON(data, string sharing data
objects JSON pretty = TRUE) with web
into format. applications
JSON or other
format programmin
for g
export. environmen
ts.

read.xlsx( Reads To import read.xlsx(file, data <- Data Common in


) an Excel tabular sheetIndex = 1, read.xlsx("file.xls frame business
sheet data from ...) x", sheetIndex = with analytics or
into a Excel files. 1) Excel financial
data content reporting
frame. tasks.

write.xlsx Writes a To export write.xlsx(data, write.xlsx(data, Excel For sharing


() data data to file, sheetName "output.xlsx") file analysis
frame Excel = "Sheet1") saved results in
to an format. in the Excel
Excel specifie format.
file. d path

read.spss Reads To import read.spss(file, data <- Data Allows


() SPSS SPSS data ...) read.spss("data.s frame integration
.sav files for av", with of SPSS
into R. analysis. to.data.frame = SPSS statistical
TRUE) content data into R
workflows.

readMat( Reads To import readMat(file) mat_data <- List Useful for


) MATLAB MATLAB readMat("file.ma with cross-
.mat data for t") MATLA platform
files use in R. B file analysis
into R. content when
transitioning
between R
and
MATLAB.

Would you like me to add more examples or any additional functions?

3.4.1 Manipulating Strings


Function Definition Purpose Syntax Example Output Application

grep() Finds patterns in Locate grep(pattern, grep("pen", 1 Identifying rows


text. matching x) c("pen", "book")) or strings
patterns. containing
specific text.

grepl() Logical test for Checks if the grepl(pattern, grepl("pen", TRUE, Filtering data
patterns. pattern exists. x) c("pen", "book")) FALSE based on text
patterns.

sub() Replaces the first Modify strings sub(pattern, sub("my", "your", "This is Adjusting labels
instance of a by replacing replacement, "This is my pen") your or strings in
pattern. content. x) pen" datasets.

gsub() Replaces all Modify gsub(pattern, gsub("a", "o", "cat "cot bot Bulk string
instances of a multiple replacement, bat rat") rot" modifications.
pattern. occurrences. x)

str_detect() Detects patterns Find if patterns str_detect(x, str_detect("hello", TRUE Searching for
in text (stringr). exist. pattern) "he") keywords.

str_replace() Replaces patterns Replace a str_replace(x, str_replace("hello", "heLlo" Reformatting or


in text (stringr). single pattern. pattern, "l", "L") correcting text.
replacement)

str_split() Splits text into Divide strings str_split(x, str_split("a,b,c", list(c("a", Parsing data into
parts based on a into pattern) ",") "b", "c")) separate
pattern. manageable components.
parts.

3.4.2 Manipulating Data Frames

Function Definiti Purpose Syntax Example Outpu Applicatio


on t n

dataframe$new Adds a Extend dataframe$new_col df$new_col <- Data Add


_col new the data <- values c(1, 2, 3) frame calculated
column frame with or derived
to a with the fields.
data new new
frame. variables colum
. n.

na.omit() Remove Clean na.omit(dataframe) na.omit(data.fra Rows Data


s rows the data me(a = c(1, NA))) witho cleaning
with by ut NA. for further
removin analysis.
g
missing incompl
values. ete
rows.

complete.cases( Checks Identify complete.cases(dataf complete.cases(c TRUE, Useful in


) which complet rame) (1, NA, 3)) FALSE, filtering
rows e rows. TRUE clean
have no rows.
missing
values.

sqldf() Execute Perform sqldf("SQL_QUERY") sqldf("SELECT * First 5 Advanced


s SQL- SQL FROM iris LIMIT rows data
like queries 5") of iris manipulati
queries directly datas on using
on data in R. et. SQL
frames. queries.

order() Sorts Reorder df[order(df$column), df[order(df$Age) Sorte Organizing


data by data ] ,] d data data for
specific frames frame better
column by . visualizati
s. values. on.

3.4.3 Data Reshaping

Function Definition Purpose Syntax Example Output Application

cbind() Combines Combine cbind(vector1, cbind(city, Data frame Merging


vectors or data vector2) state) with data by
columns horizontally. columns. columns.
into a data
frame.

rbind() Combines Append rbind(df1, df2) rbind(df1, Data frame Combining


data rows to an df2) with new multiple
frames by existing data rows. data sets.
rows. frame.

merge() Merges Combine merge(df1, merge(df1, Merged Joining data


two data related data df2, by = df2, by = data frame. sets by
frames on frames. "column") "ID") shared keys.
common
columns.

melt() Converts Reshape melt(data, melt(df, id = Long- Preparing


wide data data for id.vars = ...) "ID") format data data for
into long better frame. plotting.
format. visualization.

dcast() Aggregates Summarize dcast(data, dcast(melted, Wide- Pivot tables


and and reshape formula, ID ~ variable, format data or
reshapes data. fun.aggregate) sum) frame. summaries
long- for reports.
format
data to
wide
format.

3.4.4 Grouping Functions

Function Definition Purpose Syntax Example Output Application

apply() Applies a Summarize apply(matrix, apply(matrix, Row sums of Matrix


function to rows/columns margin, func) 1, sum) matrix. calculations.
rows or of data.
columns of a
matrix.

lapply() Applies a Process lapply(list, lapply(list, List of lengths. Working with


function to elements of func) length) data in lists.
each lists.
element of a
list.

sapply() Similar to Easier to get a sapply(list, sapply(list, Vector of Cleaner data


lapply() but simplified func) length) lengths. summaries.
simplifies the result.
output.

tapply() Applies a Group-wise tapply(vector, tapply(1:10, Grouped Summarizing


function to analysis. factor, func) rep(1:2, 5), sums. data by
subsets of a sum) groups.
vector
grouped by a
factor.

mapply() Applies a Element-wise mapply(func, mapply(sum, Element-wise Vectorized


function to operations. arg1, arg2, 1:5, 1:5) sums. computations.
multiple ...)
arguments
element-
wise.

rapply() Recursively Handle rapply(list, rapply(list, Modified Cleaning or


applies a deeply nested func) toupper) nested list. transforming
function to lists. complex lists.
nested list
elements.

Would you like further refinements or additional details on any specific function?

You might also like