0% found this document useful (0 votes)
50 views24 pages

Topic 1 - Financial Analytics and The R Environment

Uploaded by

Tahmid Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views24 pages

Topic 1 - Financial Analytics and The R Environment

Uploaded by

Tahmid Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to Financial Analytics and the R

Environment

Prof. Md Mohan Uddin, PhD

2024-11-11
Table of contents

Preface 4

1 Financial Analytics 5
1.1 AnalySIS vs AnalyTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Definition of Financial Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Keynotes about Financial Analytics . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Applications of Finanacial Analytics . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Getting Started with R and RStudio 7


2.1 The R Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The RStudio Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Source/Script Editor Panel (Top-Left) . . . . . . . . . . . . . . . . . . . 7
2.2.2 Console Panel (Bottom-Left) . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Environment/History Panel (Top-Right) . . . . . . . . . . . . . . . . . . 8
2.2.4 Files/Plots/Packages/Help Panel (Bottom-Right) . . . . . . . . . . . . . 8
2.3 Working Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Projects in RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Using the command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Getting help from R community . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3 Learning from websites and online courses . . . . . . . . . . . . . . . . . 11
2.6 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 Assignment process using the operators . . . . . . . . . . . . . . . . . . 12
2.6.2 Reassignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.1 Structure of functions in R . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.2 Creating our own function in R . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.3 Built-in functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Arithmatic Operators in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 Relational (Comparison) Operators in R . . . . . . . . . . . . . . . . . . . . . . 17
2.10 Logical Operators in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.11 R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.11.1 Installing R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.11.2 Loading R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2
2.12 R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.12.1 Types of R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.13 Data Reading and Writing to and from R . . . . . . . . . . . . . . . . . . . . . 21
2.13.1 Reading and Writing Data in Text Formats . . . . . . . . . . . . . . . . 21
2.13.2 Reading and Writing Data with Excel Files . . . . . . . . . . . . . . . . 21
2.13.3 Reading and Writing Data with R’s Native Formats . . . . . . . . . . . 22
2.13.4 Connecting to Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.13.5 Reading and Writing Data from the Web . . . . . . . . . . . . . . . . . 22
2.13.6 Other File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

References 24

3
Preface

In this topic we have two learning goals.


First, we will get introduced with financial analytics, more particularly, what financial analytics
is, how financial analytics is different from financial analysis, what tools are usually employed
for financial analytics, and some examples of financial analytics applications.
Second, we will get introduced with the R computing environment as this is going to be
the main platform of learning and applying financial analytics throughout the journey of our
learning.

4
1 Financial Analytics

1.1 AnalySIS vs AnalyTICS

The world is increasingly governed by more sophisticated, data-driven decisions. This is why
terms like analysis and analytics are so commonly recognized. Both analysis and analytics
involve examining data or information to uncover meaningful insights. For this reason, these
terms are often used interchangeably (though they shouldn’t be!). This overlap often creates
confusion for many of us. Do these terms carry different meanings? If so, how are they
different?
Yes, there is a boundary between the meanings of these terms, at least from two perspectives:
the “why” perspective and the “how” perspective. From the “why” perspective, analySIS
aims to reveal the trends, status, or performance of an event or phenomenon that has already
occurred or is currently happening. In contrast, analyTICS is intended to predict the future
path of an event or phenomenon for optimal, actionable, and insightful decision-making and
strategic planning. From the “how” perspective, analySIS primarily focuses on describing data,
identifying relationships, and visualizing historical data. Conversely, analyTICS goes further
by deriving insights from historical data and using advanced statistical modeling, machine
learning algorithms, and predictive and prescriptive models to forecast future outcomes.

1.2 Definition of Financial Analytics

Financial analytics is an iterative process that uses statistical models and computerized al-
gorithms to develop predictive or prescriptive models based on financial and related data,
ultimately supporting better financial decision-making (Bennett and Hugen 2016).

1.3 Keynotes about Financial Analytics

From the above discussion, we find the following key notes about financial analytics: financial
analytics

• is an iterative process,
• uses statistical models and computerized algorithms,

5
• develop predictive and prescriptive models,
• uses financial and relevant data, and
• supports better financial decision making.

1.4 Applications of Finanacial Analytics

Financial analytics has numerous applications, including budgeting and forecasting, credit risk
management, predictive investment and portfolio optimization, financial fraud detection and
prevention, and algorithmic trading.

6
2 Getting Started with R and RStudio

2.1 The R Console

We can type the commands at the > symbol in the default R console. A continuation prompt
+ may be shown if the command is not complete, in that case we can complete it by typing the
remaining part of the command. Sometimes we may need to recall the previous commands,
and that is possible by pressing the up and down arrow keys. The R console also provides a
basic automatic exploration of possible function or filenames by pressing the tab key.

2.2 The RStudio Interface

More productive use of R is possible by Rstudio, a free and open source integrated development
environment (IDE) offered by posit. The desktop verion is downloaded and installed after
installing R. Alternatively, one can use the online Posit Cloud for the same purpose.
Opening RStudio should bring us 3 or 4 panels. If 3 panels are open we have to open the 4th
panel, i.e., a new R script by clicking File > New File > R Script. The four panels of RStudio
are introduced below.
The RStudio interface has four main panels each serving a specific purpose. We can customize,
resize, and reposition these panels when needed.

2.2.1 Source/Script Editor Panel (Top-Left)

This is where we can write and edit our R scripts, functions, markdown files, and documents.
It allows us to save, run, and organize code.

2.2.2 Console Panel (Bottom-Left)

The Console is where we can directly enter and execute R commands. It displays output from
the code we run and is useful for testing quick commands or checking results immediately.

7
2.2.3 Environment/History Panel (Top-Right)

• Environment Tab: Displays all the objects, data frames, variables, functions, and other
elements that are currently in memory.
• History Tab: Shows a list of previously executed commands, allowing us to reuse or
review past commands.

2.2.4 Files/Plots/Packages/Help Panel (Bottom-Right)

This panel serves multiple functions.

• Files Tab: Allows us to navigate our file directory.


• Plots Tab: Displays plots and visualizations generated by our code.
• Packages Tab: Manages installed packages, enabling us to install, load, and unload them.
• Help Tab: Provides documentation on functions, packages, and other R-related topics.

2.3 Working Directory

The current working directory is displayed in the Console area of RStudio. We can also get the
working directory by using following command, getwd(). To set a new directory the command
should be stewd("our/working/directory), where "our/working/directory" is the path
to the folder where we want our directory of the work. There are other ways of getting and
setting working directory, which we will eventually capture.

2.4 Projects in RStudio

Project is a powerful, convenient, and efficient way to organize work especially when working
on multiple tasks or collaborations. Using projects helps keep files, data, and settings organized
by setting up a self-contained environment with its own working directory. To create a new
project in RStudio,

• Go to: File > New Project.


• Choose Location:
– New Directory: Create a new folder for the project.
– Existing Directory: Use an existing folder as the project directory.
• Select Project Type: We can create an empty project, or any other type of project,
depending on our needs.

8
• Finish Setup: Name the project and specify the directory.

RStudio will switch to this new project environment.

2.5 Getting Help

It is very rarely possible to remember all the commands and their arguments, structures, etc.,
except we practice those frequently. Getting help is very common in using R. There are several
convenient ways fo getting help.

2.5.1 Using the command

• Using the help or ? before the command

help(mean)
?mean

• Going through the example by example() command

example(mean)

• Keyword search with help.search() command

help.search("arithmatic mean")

2.5.2 Getting help from R community

The R programming community is known for its collaborative spirit and extensive resources,
making it easier for users of all skill levels to seek help and expand their knowledge. There
are several ways to connect with the R community, ranging from online forums to in-person
events. Below are some of the primary resources available for R users.
RStudio Community Forum
The RStudio Community Forum (accessible at https://community.rstudio.com/) is a valuable
platform for discussing a wide range of topics related to R and RStudio. The forum provides
a structured environment where users can ask questions about R programming, packages,
data visualization, and Shiny applications. It’s particularly useful for beginners due to its
organized categories and approachable community. Users are encouraged to describe their
questions clearly and provide code examples to facilitate helpful responses.

9
Stack Overflow
Stack Overflow is a popular question-and-answer website for programmers and has a dedicated
R programming community. Beginners can search existing questions or post new ones, tagging
them with [r] to get responses from experienced R users. When posting questions, it’s beneficial
to include clear descriptions, code snippets, and data examples to facilitate helpful answers.
Stack Overflow is available at https://stackoverflow.com/.
R-Help Mailing List
The R-help mailing list is one of the original resources for R users seeking assistance. This mail-
ing list provides an archive of previously discussed topics, which can be browsed for solutions or
reference material. Users can subscribe to the list or view the archive to access information or
post questions. The mailing list is available at https://stat.ethz.ch/mailman/listinfo/r-help. It
is advisable to include reproducible examples and clear explanations when posting to maximize
the likelihood of receiving useful answers.
GitHub and GitLab Repositories
Many R packages are hosted on platforms such as GitHub and GitLab, where users can report
issues, request features, or ask questions directly in the package repositories. Each repository
typically has an “Issues” section where users can create posts. It is important to follow each
repository’s contribution guidelines and be specific in describing the question or problem to
receive helpful responses.
R4DS Online Learning Community
The R for Data Science (R4DS) Online Learning Community is an interactive platform created
to support learners as they work through Hadley Wickham’s book R for Data Science. This
community offers various resources, including Slack channels for discussions, book clubs, and
coding challenges. The R4DS community is suitable for users interested in working collabora-
tively with others and can be accessed at https://www.rfordatasci.com/.
Social Media and Online Communities
R users are active on social media platforms and other online communities:
Twitter (X): The #rstats hashtag is frequently used by R programmers to share resources, tips,
and updates.
Reddit: Subreddits such as r/Rlanguage and r/datascience provide forums for discussing R
programming, data science, and statistical analysis. Users can post questions, share articles,
and discuss challenges related to R.
YouTube: Several YouTube channels and video tutorial series focus on R programming, with
content tailored for beginners. Channels like StatQuest with Josh Starmer, Data School, and
freeCodeCamp.org provide visual, step-by-step tutorials covering fundamental R programming
topics, including data analysis, visualization, and statistical methods.

10
R Bloggers is a community-driven website that aggregates blog posts about R programming
from contributors worldwide. For beginners, R Bloggers offers tutorials, tips, and insights on a
range of R topics, from basic programming and data manipulation to advanced analytics. The
website is a good resource for staying updated on R developments and discovering beginner-
friendly tutorials. Visit R Bloggers at https://www.r-bloggers.com/.
Local R User Groups and R-Ladies
Local R user groups provide in-person and virtual events where users can network, participate
in workshops, and share knowledge. These groups can be found on Meetup.com by searching
for “R user group” or “R programming.”
R-Ladies is a global organization aimed at promoting gender diversity within the R com-
munity. The organization supports local chapters worldwide, offering an inclusive environ-
ment for learning R and building professional networks. More information is available at
https://rladies.org/.
R Conferences and Events
Attending R-focused conferences is an excellent way to engage with the R community and
learn from leading experts. Two prominent events include:
RStudio Conference: This conference focuses on the latest advancements in RStudio products
and R programming. useR! Conference: The useR! Conference is the primary annual R user
conference, featuring talks, workshops, and community events that cover a broad spectrum of
R topics.
These conferences offer opportunities for hands-on learning, networking, and exposure to recent
developments in the R ecosystem.

2.5.3 Learning from websites and online courses

The Comprehensive R Archive Network (CRAN)


The Comprehensive R Archive Network (CRAN) is the official repository for R, offering ex-
tensive documentation, guides, and package downloads. CRAN’s Contributed Documentation
section provides manuals, tutorials, and resources created by experienced R users. These in-
clude the R Language Definition and the Introduction to R, which are invaluable for beginners
who want a detailed understanding of the language’s structure and functions. Access CRAN
at https://cran.r-project.org/.
RStudio’s Resources and Cheatsheets
RStudio offers an array of beginner-friendly resources, including cheatsheets and video tutorials.
These cheatsheets cover essential topics, such as dplyr for data manipulation, ggplot2 for data
visualization, and R Markdown for reproducible reporting. In addition, the RStudio Education

11
page contains links to webinars, interactive courses, and tutorial videos. RStudio’s resources
can be found at https://www.rstudio.com/resources/.
swirl: Learn R Programming Interactively
swirl is an interactive R package that teaches R programming directly within the R console.
It offers lessons in data science and statistical analysis while allowing users to practice writing
code. This interactive approach is particularly helpful for beginners, as it provides immediate
feedback on coding exercises. swirl can be installed and accessed from CRAN and offers a
range of tutorials on basic and advanced topics. Learn more at https://swirlstats.com/.
R for Data Science (R4DS) Online Learning Community
The R for Data Science (R4DS) Online Learning Community is an extension of Hadley
Wickham’s popular book R for Data Science. This community is ideal for beginners
who want to progress through the book alongside others and seek support from mentors.
The R4DS community offers book clubs, Slack channels, and discussion groups, making
it a supportive environment for learning and troubleshooting. Access the community at
https://www.rfordatasci.com/.
DataCamp and Coursera R Courses
DataCamp and Coursera are online learning platforms that offer structured courses in R
programming. Both platforms have introductory R courses that guide users through the
basics of programming, data manipulation, and visualization. These courses include inter-
active coding exercises and projects that reinforce concepts as users progress. Link to Dat-
aCamp R courses is: https://www.datacamp.com/, and the link to Coursera R courses is:
https://www.coursera.org/.

2.6 Assignment

In R, assignment refers to the process of storing a value or a data object, such as a vector,
matrix, or function, within a named variable. Assignment allows R programmers to save data
in variables for later use, enabling code reusability, organization, and efficient data manipula-
tion. The assigned variables can then be called upon throughout the program, making complex
calculations and data analysis easier to handle.

2.6.1 Assignment process using the operators

R provides several operators for performing assignments. Each operator serves the same
primary function — storing values in variables — but is used differently based on coding
style and preference. The three assignment operators are: - Leftward Assignment: <-

12
x <- 100
country <- "Bangladesh"

• Rightward Assignment: ->

100 -> x
"Bangladesh" -> country

• Equals Assignment: =

x = 100
country = "Bangladesh"

In all these three examples, he value 100 is assigned to the variable x, and the string
"Bangladesh" is assigned to the variable country.
While using -> or = operators are acceptable in R, the <- operator is generally preferred and
is claimed to be more convenient than the other two.

2.6.2 Reassignments

In R, variables can be reassigned and that simply updates the variable with a new value.

x <- 100
x <- 500
country <- "Bangladesh"
country <- "Mexico"

In the above example, print() command will always print the final value or object after the
reassignment.

print(x)

[1] 500

print(country)

[1] "Mexico"

The previous assignment will be forgotten, as reflected by the output of the ’print()‘ com-
mand.

13
2.7 Functions

Functions in R provides a means to package and execute blocks of code that perform specific
tasks. Functions are reusable units of code designed to accomplish a particular computation
or sequence of operations. They help streamline complex calculations, automate repetitive
processes, and improve the organization of code. By using functions, programmers can break
down complex problems into simpler parts, making code easier to read, debug, and maintain.

2.7.1 Structure of functions in R

In R, functions are defined using the function keyword, followed by a set of parentheses and
placeholders. A function’s syntax in R generally follows the following structure.

function_name <- function(argument1, argument2, ...) {


return(result)
}

The structure elements are explained below.


Function Name: A descriptive name given to the function that reflects its purpose.
Arguments: Variables or placeholders within the parentheses that represent inputs to the
function.
Body: The code inside the curly braces {} that specifies what the function does. This is
where operations, calculations, and other instructions are written.
Return Value: The output produced by the function, often specified using the return() func-
tion, although R will return the last evaluated expression by default if no return() statement
is used.

2.7.2 Creating our own function in R

Below is an example of a simple user-defined function to calculate the price of product after
certain percentage of discount.

price2pay <- function(p, q, d) {


result <- p*q*(1-d)
return(result)
}

14
Now, let’s say we go to the shop, buy 100 units of the product each having price of $12. The
seller will provide 10% discount. From our manual calculations, we can see that the price to
pay = 100 * $12 * (1-0.10) = $1,080. Let us see if our created function price2pay gives us
the same result.

price2pay(12, 100, 0.10)

[1] 1080

Is not it wonderful? We have just created a fantastic usable function in R.

2.7.3 Built-in functions in R

R includes numerous built-in functions that perform common tasks. These are pre-defined
and readily available in the R environment. Some examples follow.

• mean(): Calculates the mean (average) of a numeric vector.


• sum(): Computes the sum of values in a vector.
• print(): Outputs values or messages to the console.
• plot(): Creates a visual plot of data.

We have already use the print() function. Let us use the sum() function.

a <- 10
b <- 14
c <- 20
sum(a, b, c)

[1] 44

2.8 Arithmatic Operators in R

Arithmetic operators perform basic mathematical operations on numeric data. They work on
both individual numbers and larger data structures like vectors and matrices.

• + : Addition
• - : Subtraction
• * : Multiplication

15
• / : Division
• ^ or ** : Exponentiation
• %% : Modulus (remainder of division)
• %/% : Integer division

Examples of the uses of these operators are given below.

l <- 100
s <- 30

Now let us see how these operators work. See the outputs.

l + s

[1] 130

l - s

[1] 70

l * s

[1] 3000

l / s

[1] 3.333333

l ^ s

[1] 1e+60

l %% s

[1] 10

16
l %/% s

[1] 3

2.9 Relational (Comparison) Operators in R

Relational operators are used to compare values. They return logical values (TRUE or FALSE)
based on whether a specified relationship holds between values.

• == : Equal to
• != : Not equal to
• < : Less than
• > : Greater than
• <= : Less than or equal to
• >= : Greater than or equal to

Let us assign 50, and 100 to two variables, m, and n, respectively.

m <- 50
n <- 100

Now, examples of these operators with output follows.

m == n

[1] FALSE

m != n

[1] TRUE

m < n

[1] TRUE

m >= n

[1] FALSE

17
2.10 Logical Operators in R

Logical operators are used to combine multiple logical conditions or invert logical values. These
operators are crucial in decision-making structures, such as if statements, where compound
conditions are often required.

• & : Logical AND (element-wise)


• | : Logical OR (element-wise)
• && : Logical AND (only evaluates the first element of each vector)
• || : Logical OR (only evaluates the first element of each vector)
• ! : Logical NOT (inverts logical value)

Let us assign TRUE, and FALSE to two variables, u, and v, respectively.

u <- TRUE
v <- FALSE

Now, examples of these operators with output follows.

u & v

[1] FALSE

u | v

[1] TRUE

!u

[1] FALSE

u && v

[1] FALSE

18
2.11 R Packages

Packages are central to the power and versatility of R. R packages are collections of functions,
data sets, and documentation that extend the capabilities of the R language. R packages allow
users to easily add new functionalities without needing to write complex code from scratch,
significantly enhancing productivity and flexibility. By default, R contains a small number of
essential packages, called base packages. Other packages are created and shared by developers
and researchers worldwide.
The base packages are preloaded with every R session. These are: stats, graphics,
grDevices, utils, datasets, methods, and base. To see which packages are currently
loaded, we can use the command search() and to see which packages are installed we can
use the command library().

2.11.1 Installing R Packages

To use a package in R, it must first be installed. The base packages are preinstalled. Other
R packages can be easily installed from CRAN (the Comprehensive R Archive Network), a
widely-used repository of over 18,000 R packages, as well as from other sources like GitHub.
Installing Packages from CRAN
To install a package from CRAN, the install.packages() function is used. This function
downloads and installs the package on the user’s system. Once installed, the package is stored
on the user’s computer and can be loaded for use in any R session.
Installing Packages from GitHub
Some packages are hosted on GitHub, especially those still under development. To install
GitHub packages, the devtools package, which contains the install_github() function, is
often used.

2.11.2 Loading R Packages

After installation, packages need to be loaded into the R session before their functions can be
used. The library() function is used to load packages.

19
2.12 R Objects

Nearly everything in R — from a single value to a complete dataset — is treated as an


object. An object can represent various types of data, structures, and datasets. The structures
serve as containers that hold values. Objects also include complex structures like models and
functions.
R objects enable users to organize, manipulate, and analyze data effectively.

2.12.1 Types of R objects

There are various types of objects in R. Most important objects are summarized in the following
table.

Object
Type Description Struct. Data Type Example Use Cases
Vector Basic data 1-dim Numeric, c(1, 2, 3) Simple
structure with a Char, sequences,
sequence of Logical basic
elements of the calculations
same type.
Matrix 2D structure 2-dim Numeric, matrix(1:6, Mathematical
with elements of Char nrow = 2, operations,
the same type ncol = 3) linear
arranged in algebra
rows/columns.
Array Multi- Multi-dim Numeric, array(1:12, Complex
dimensional Char dim = c(2, data, like
structure for 3, 2)) time series,
elements of the 3D data
same type.
List Flexible 1-dim Mixed Types list(name = Heterogeneous
structure for "John", age data,
elements of = 30, scores grouping
different types = c(85, 90)) results
and structures.
Data Table-like 2-dim Mixed Types data.frame Data
Frame structure with (Name = analysis,
each column as a c("Alice", statistical
variable (mixed "Bob"), Age modeling
types). = c(25, 30))

20
Object
Type Description Struct. Data Type Example Use Cases
Factor Special object 1-dim Categ factor(c Categorical
representing ("male", variables,
categorical data "female", statistical
with unique "female")) models
levels.
Function Object that N/A Code function(x) Creating
performs a { reusable
specific task or return(x^2) code for
calculation. } tasks

2.13 Data Reading and Writing to and from R

In R, various tools and functions facilitate reading and writing data to and from different file
formats. These tools allow for importing data from spreadsheets, databases, and web sources,
and exporting data for reporting, analysis, or further processing.

2.13.1 Reading and Writing Data in Text Formats

• CSV Files: Common format for data exchange, especially for data frames.
– Read CSV: read.csv("file.csv")
– Write CSV: write.csv(data, "file.csv")
• TXT Files: Used for plain text data, often with customized delimiters.
– Read TXT: read.table("file.txt", sep = "\t") (set the separator as
needed)
– Write TXT: write.table(data, "file.txt", sep = "\t")

2.13.2 Reading and Writing Data with Excel Files

R supports Excel files like .xlsx and .xls. While base R does not provide direct functions
for Excel, there are packages designed for this purpose:

• readxl Package:
– Read Excel: readxl::read_excel("file.xlsx")
• writexl Package:
– Write Excel: writexl::write_xlsx(data, "file.xlsx")

21
2.13.3 Reading and Writing Data with R’s Native Formats

R has its own native formats, preserving metadata like factors and attributes, making them
ideal for R-specific projects.

• RDS Files:
– Read RDS: readRDS("file.rds")
– Write RDS: saveRDS(data, "file.rds")
• RData Files: Store multiple R objects in one file.
– Load RData: load("file.RData")
– Save RData: save(data1, data2, file = "file.RData")

2.13.4 Connecting to Databases

For larger datasets or those stored in databases, R offers packages to connect directly to
relational databases (e.g., MySQL, PostgreSQL) or non-relational databases.

• DBI and RMySQL/RPostgres Packages:


– Establish Connection: dbConnect(RMySQL::MySQL(), dbname = "database_name",
host = "host")
– Read from Database: dbReadTable(connection, "table_name")
– Write to Database: dbWriteTable(connection, "table_name", data)

2.13.5 Reading and Writing Data from the Web

• readr Package:
– Read from Web: readr::read_csv("http://example.com/data.csv")
• httr and jsonlite Packages**: For reading from APIs or JSON files.
– Read JSON: jsonlite::fromJSON("http://example.com/data.json")
– Write JSON: jsonlite::toJSON(data, "file.json")

2.13.6 Other File Formats

R can handle many other file types using specific packages:

• HDF5 (Hierarchical Data Format): Ideal for large datasets.


– h5 Package for reading/writing: rhdf5::h5read("file.h5", "dataset")

22
• Feather and Parquet Files: Highly efficient, cross-language formats for big data.
– arrow Package: arrow::read_feather("file.feather") and arrow::read_parquet("file.parqu

These tools and packages make R highly flexible for data manipulation, allowing users to
handle data in various formats easily and efficiently.
For further readings: Chen (2024), James et al. (2021)

23
References

Bennett, Mark J. (Mark Joseph), and Dirk L. Hugen. 2016. Financial Analytics with R:
Building a Laptop Laboratory for Data Science. Cambridge, UK: Cambridge University
Press.
Chen, Jenny K. 2024. Financial Data Analytics with R: Monte-Carlo Validation. 1st ed. Boca
Raton: Chapman; Hall/CRC. https://doi.org/10.1201/9781003469704.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction
to Statistical Learning: With Applications in R. Springer Texts in Statistics. New York,
NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.

24

You might also like