Topic 1 - Financial Analytics and The R Environment
Topic 1 - Financial Analytics and The R Environment
Environment
2024-11-11
Table of contents
Preface 4
1 Financial Analytics 5
1.1 AnalySIS vs AnalyTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Definition of Financial Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Keynotes about Financial Analytics . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Applications of Finanacial Analytics . . . . . . . . . . . . . . . . . . . . . . . . 6
2
2.12 R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.12.1 Types of R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.13 Data Reading and Writing to and from R . . . . . . . . . . . . . . . . . . . . . 21
2.13.1 Reading and Writing Data in Text Formats . . . . . . . . . . . . . . . . 21
2.13.2 Reading and Writing Data with Excel Files . . . . . . . . . . . . . . . . 21
2.13.3 Reading and Writing Data with R’s Native Formats . . . . . . . . . . . 22
2.13.4 Connecting to Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.13.5 Reading and Writing Data from the Web . . . . . . . . . . . . . . . . . 22
2.13.6 Other File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References 24
3
Preface
4
1 Financial Analytics
The world is increasingly governed by more sophisticated, data-driven decisions. This is why
terms like analysis and analytics are so commonly recognized. Both analysis and analytics
involve examining data or information to uncover meaningful insights. For this reason, these
terms are often used interchangeably (though they shouldn’t be!). This overlap often creates
confusion for many of us. Do these terms carry different meanings? If so, how are they
different?
Yes, there is a boundary between the meanings of these terms, at least from two perspectives:
the “why” perspective and the “how” perspective. From the “why” perspective, analySIS
aims to reveal the trends, status, or performance of an event or phenomenon that has already
occurred or is currently happening. In contrast, analyTICS is intended to predict the future
path of an event or phenomenon for optimal, actionable, and insightful decision-making and
strategic planning. From the “how” perspective, analySIS primarily focuses on describing data,
identifying relationships, and visualizing historical data. Conversely, analyTICS goes further
by deriving insights from historical data and using advanced statistical modeling, machine
learning algorithms, and predictive and prescriptive models to forecast future outcomes.
Financial analytics is an iterative process that uses statistical models and computerized al-
gorithms to develop predictive or prescriptive models based on financial and related data,
ultimately supporting better financial decision-making (Bennett and Hugen 2016).
From the above discussion, we find the following key notes about financial analytics: financial
analytics
• is an iterative process,
• uses statistical models and computerized algorithms,
5
• develop predictive and prescriptive models,
• uses financial and relevant data, and
• supports better financial decision making.
Financial analytics has numerous applications, including budgeting and forecasting, credit risk
management, predictive investment and portfolio optimization, financial fraud detection and
prevention, and algorithmic trading.
6
2 Getting Started with R and RStudio
We can type the commands at the > symbol in the default R console. A continuation prompt
+ may be shown if the command is not complete, in that case we can complete it by typing the
remaining part of the command. Sometimes we may need to recall the previous commands,
and that is possible by pressing the up and down arrow keys. The R console also provides a
basic automatic exploration of possible function or filenames by pressing the tab key.
More productive use of R is possible by Rstudio, a free and open source integrated development
environment (IDE) offered by posit. The desktop verion is downloaded and installed after
installing R. Alternatively, one can use the online Posit Cloud for the same purpose.
Opening RStudio should bring us 3 or 4 panels. If 3 panels are open we have to open the 4th
panel, i.e., a new R script by clicking File > New File > R Script. The four panels of RStudio
are introduced below.
The RStudio interface has four main panels each serving a specific purpose. We can customize,
resize, and reposition these panels when needed.
This is where we can write and edit our R scripts, functions, markdown files, and documents.
It allows us to save, run, and organize code.
The Console is where we can directly enter and execute R commands. It displays output from
the code we run and is useful for testing quick commands or checking results immediately.
7
2.2.3 Environment/History Panel (Top-Right)
• Environment Tab: Displays all the objects, data frames, variables, functions, and other
elements that are currently in memory.
• History Tab: Shows a list of previously executed commands, allowing us to reuse or
review past commands.
The current working directory is displayed in the Console area of RStudio. We can also get the
working directory by using following command, getwd(). To set a new directory the command
should be stewd("our/working/directory), where "our/working/directory" is the path
to the folder where we want our directory of the work. There are other ways of getting and
setting working directory, which we will eventually capture.
Project is a powerful, convenient, and efficient way to organize work especially when working
on multiple tasks or collaborations. Using projects helps keep files, data, and settings organized
by setting up a self-contained environment with its own working directory. To create a new
project in RStudio,
8
• Finish Setup: Name the project and specify the directory.
It is very rarely possible to remember all the commands and their arguments, structures, etc.,
except we practice those frequently. Getting help is very common in using R. There are several
convenient ways fo getting help.
help(mean)
?mean
example(mean)
help.search("arithmatic mean")
The R programming community is known for its collaborative spirit and extensive resources,
making it easier for users of all skill levels to seek help and expand their knowledge. There
are several ways to connect with the R community, ranging from online forums to in-person
events. Below are some of the primary resources available for R users.
RStudio Community Forum
The RStudio Community Forum (accessible at https://community.rstudio.com/) is a valuable
platform for discussing a wide range of topics related to R and RStudio. The forum provides
a structured environment where users can ask questions about R programming, packages,
data visualization, and Shiny applications. It’s particularly useful for beginners due to its
organized categories and approachable community. Users are encouraged to describe their
questions clearly and provide code examples to facilitate helpful responses.
9
Stack Overflow
Stack Overflow is a popular question-and-answer website for programmers and has a dedicated
R programming community. Beginners can search existing questions or post new ones, tagging
them with [r] to get responses from experienced R users. When posting questions, it’s beneficial
to include clear descriptions, code snippets, and data examples to facilitate helpful answers.
Stack Overflow is available at https://stackoverflow.com/.
R-Help Mailing List
The R-help mailing list is one of the original resources for R users seeking assistance. This mail-
ing list provides an archive of previously discussed topics, which can be browsed for solutions or
reference material. Users can subscribe to the list or view the archive to access information or
post questions. The mailing list is available at https://stat.ethz.ch/mailman/listinfo/r-help. It
is advisable to include reproducible examples and clear explanations when posting to maximize
the likelihood of receiving useful answers.
GitHub and GitLab Repositories
Many R packages are hosted on platforms such as GitHub and GitLab, where users can report
issues, request features, or ask questions directly in the package repositories. Each repository
typically has an “Issues” section where users can create posts. It is important to follow each
repository’s contribution guidelines and be specific in describing the question or problem to
receive helpful responses.
R4DS Online Learning Community
The R for Data Science (R4DS) Online Learning Community is an interactive platform created
to support learners as they work through Hadley Wickham’s book R for Data Science. This
community offers various resources, including Slack channels for discussions, book clubs, and
coding challenges. The R4DS community is suitable for users interested in working collabora-
tively with others and can be accessed at https://www.rfordatasci.com/.
Social Media and Online Communities
R users are active on social media platforms and other online communities:
Twitter (X): The #rstats hashtag is frequently used by R programmers to share resources, tips,
and updates.
Reddit: Subreddits such as r/Rlanguage and r/datascience provide forums for discussing R
programming, data science, and statistical analysis. Users can post questions, share articles,
and discuss challenges related to R.
YouTube: Several YouTube channels and video tutorial series focus on R programming, with
content tailored for beginners. Channels like StatQuest with Josh Starmer, Data School, and
freeCodeCamp.org provide visual, step-by-step tutorials covering fundamental R programming
topics, including data analysis, visualization, and statistical methods.
10
R Bloggers is a community-driven website that aggregates blog posts about R programming
from contributors worldwide. For beginners, R Bloggers offers tutorials, tips, and insights on a
range of R topics, from basic programming and data manipulation to advanced analytics. The
website is a good resource for staying updated on R developments and discovering beginner-
friendly tutorials. Visit R Bloggers at https://www.r-bloggers.com/.
Local R User Groups and R-Ladies
Local R user groups provide in-person and virtual events where users can network, participate
in workshops, and share knowledge. These groups can be found on Meetup.com by searching
for “R user group” or “R programming.”
R-Ladies is a global organization aimed at promoting gender diversity within the R com-
munity. The organization supports local chapters worldwide, offering an inclusive environ-
ment for learning R and building professional networks. More information is available at
https://rladies.org/.
R Conferences and Events
Attending R-focused conferences is an excellent way to engage with the R community and
learn from leading experts. Two prominent events include:
RStudio Conference: This conference focuses on the latest advancements in RStudio products
and R programming. useR! Conference: The useR! Conference is the primary annual R user
conference, featuring talks, workshops, and community events that cover a broad spectrum of
R topics.
These conferences offer opportunities for hands-on learning, networking, and exposure to recent
developments in the R ecosystem.
11
page contains links to webinars, interactive courses, and tutorial videos. RStudio’s resources
can be found at https://www.rstudio.com/resources/.
swirl: Learn R Programming Interactively
swirl is an interactive R package that teaches R programming directly within the R console.
It offers lessons in data science and statistical analysis while allowing users to practice writing
code. This interactive approach is particularly helpful for beginners, as it provides immediate
feedback on coding exercises. swirl can be installed and accessed from CRAN and offers a
range of tutorials on basic and advanced topics. Learn more at https://swirlstats.com/.
R for Data Science (R4DS) Online Learning Community
The R for Data Science (R4DS) Online Learning Community is an extension of Hadley
Wickham’s popular book R for Data Science. This community is ideal for beginners
who want to progress through the book alongside others and seek support from mentors.
The R4DS community offers book clubs, Slack channels, and discussion groups, making
it a supportive environment for learning and troubleshooting. Access the community at
https://www.rfordatasci.com/.
DataCamp and Coursera R Courses
DataCamp and Coursera are online learning platforms that offer structured courses in R
programming. Both platforms have introductory R courses that guide users through the
basics of programming, data manipulation, and visualization. These courses include inter-
active coding exercises and projects that reinforce concepts as users progress. Link to Dat-
aCamp R courses is: https://www.datacamp.com/, and the link to Coursera R courses is:
https://www.coursera.org/.
2.6 Assignment
In R, assignment refers to the process of storing a value or a data object, such as a vector,
matrix, or function, within a named variable. Assignment allows R programmers to save data
in variables for later use, enabling code reusability, organization, and efficient data manipula-
tion. The assigned variables can then be called upon throughout the program, making complex
calculations and data analysis easier to handle.
R provides several operators for performing assignments. Each operator serves the same
primary function — storing values in variables — but is used differently based on coding
style and preference. The three assignment operators are: - Leftward Assignment: <-
12
x <- 100
country <- "Bangladesh"
100 -> x
"Bangladesh" -> country
• Equals Assignment: =
x = 100
country = "Bangladesh"
In all these three examples, he value 100 is assigned to the variable x, and the string
"Bangladesh" is assigned to the variable country.
While using -> or = operators are acceptable in R, the <- operator is generally preferred and
is claimed to be more convenient than the other two.
2.6.2 Reassignments
In R, variables can be reassigned and that simply updates the variable with a new value.
x <- 100
x <- 500
country <- "Bangladesh"
country <- "Mexico"
In the above example, print() command will always print the final value or object after the
reassignment.
print(x)
[1] 500
print(country)
[1] "Mexico"
The previous assignment will be forgotten, as reflected by the output of the ’print()‘ com-
mand.
13
2.7 Functions
Functions in R provides a means to package and execute blocks of code that perform specific
tasks. Functions are reusable units of code designed to accomplish a particular computation
or sequence of operations. They help streamline complex calculations, automate repetitive
processes, and improve the organization of code. By using functions, programmers can break
down complex problems into simpler parts, making code easier to read, debug, and maintain.
In R, functions are defined using the function keyword, followed by a set of parentheses and
placeholders. A function’s syntax in R generally follows the following structure.
Below is an example of a simple user-defined function to calculate the price of product after
certain percentage of discount.
14
Now, let’s say we go to the shop, buy 100 units of the product each having price of $12. The
seller will provide 10% discount. From our manual calculations, we can see that the price to
pay = 100 * $12 * (1-0.10) = $1,080. Let us see if our created function price2pay gives us
the same result.
[1] 1080
R includes numerous built-in functions that perform common tasks. These are pre-defined
and readily available in the R environment. Some examples follow.
We have already use the print() function. Let us use the sum() function.
a <- 10
b <- 14
c <- 20
sum(a, b, c)
[1] 44
Arithmetic operators perform basic mathematical operations on numeric data. They work on
both individual numbers and larger data structures like vectors and matrices.
• + : Addition
• - : Subtraction
• * : Multiplication
15
• / : Division
• ^ or ** : Exponentiation
• %% : Modulus (remainder of division)
• %/% : Integer division
l <- 100
s <- 30
Now let us see how these operators work. See the outputs.
l + s
[1] 130
l - s
[1] 70
l * s
[1] 3000
l / s
[1] 3.333333
l ^ s
[1] 1e+60
l %% s
[1] 10
16
l %/% s
[1] 3
Relational operators are used to compare values. They return logical values (TRUE or FALSE)
based on whether a specified relationship holds between values.
• == : Equal to
• != : Not equal to
• < : Less than
• > : Greater than
• <= : Less than or equal to
• >= : Greater than or equal to
m <- 50
n <- 100
m == n
[1] FALSE
m != n
[1] TRUE
m < n
[1] TRUE
m >= n
[1] FALSE
17
2.10 Logical Operators in R
Logical operators are used to combine multiple logical conditions or invert logical values. These
operators are crucial in decision-making structures, such as if statements, where compound
conditions are often required.
u <- TRUE
v <- FALSE
u & v
[1] FALSE
u | v
[1] TRUE
!u
[1] FALSE
u && v
[1] FALSE
18
2.11 R Packages
Packages are central to the power and versatility of R. R packages are collections of functions,
data sets, and documentation that extend the capabilities of the R language. R packages allow
users to easily add new functionalities without needing to write complex code from scratch,
significantly enhancing productivity and flexibility. By default, R contains a small number of
essential packages, called base packages. Other packages are created and shared by developers
and researchers worldwide.
The base packages are preloaded with every R session. These are: stats, graphics,
grDevices, utils, datasets, methods, and base. To see which packages are currently
loaded, we can use the command search() and to see which packages are installed we can
use the command library().
To use a package in R, it must first be installed. The base packages are preinstalled. Other
R packages can be easily installed from CRAN (the Comprehensive R Archive Network), a
widely-used repository of over 18,000 R packages, as well as from other sources like GitHub.
Installing Packages from CRAN
To install a package from CRAN, the install.packages() function is used. This function
downloads and installs the package on the user’s system. Once installed, the package is stored
on the user’s computer and can be loaded for use in any R session.
Installing Packages from GitHub
Some packages are hosted on GitHub, especially those still under development. To install
GitHub packages, the devtools package, which contains the install_github() function, is
often used.
After installation, packages need to be loaded into the R session before their functions can be
used. The library() function is used to load packages.
19
2.12 R Objects
There are various types of objects in R. Most important objects are summarized in the following
table.
Object
Type Description Struct. Data Type Example Use Cases
Vector Basic data 1-dim Numeric, c(1, 2, 3) Simple
structure with a Char, sequences,
sequence of Logical basic
elements of the calculations
same type.
Matrix 2D structure 2-dim Numeric, matrix(1:6, Mathematical
with elements of Char nrow = 2, operations,
the same type ncol = 3) linear
arranged in algebra
rows/columns.
Array Multi- Multi-dim Numeric, array(1:12, Complex
dimensional Char dim = c(2, data, like
structure for 3, 2)) time series,
elements of the 3D data
same type.
List Flexible 1-dim Mixed Types list(name = Heterogeneous
structure for "John", age data,
elements of = 30, scores grouping
different types = c(85, 90)) results
and structures.
Data Table-like 2-dim Mixed Types data.frame Data
Frame structure with (Name = analysis,
each column as a c("Alice", statistical
variable (mixed "Bob"), Age modeling
types). = c(25, 30))
20
Object
Type Description Struct. Data Type Example Use Cases
Factor Special object 1-dim Categ factor(c Categorical
representing ("male", variables,
categorical data "female", statistical
with unique "female")) models
levels.
Function Object that N/A Code function(x) Creating
performs a { reusable
specific task or return(x^2) code for
calculation. } tasks
In R, various tools and functions facilitate reading and writing data to and from different file
formats. These tools allow for importing data from spreadsheets, databases, and web sources,
and exporting data for reporting, analysis, or further processing.
• CSV Files: Common format for data exchange, especially for data frames.
– Read CSV: read.csv("file.csv")
– Write CSV: write.csv(data, "file.csv")
• TXT Files: Used for plain text data, often with customized delimiters.
– Read TXT: read.table("file.txt", sep = "\t") (set the separator as
needed)
– Write TXT: write.table(data, "file.txt", sep = "\t")
R supports Excel files like .xlsx and .xls. While base R does not provide direct functions
for Excel, there are packages designed for this purpose:
• readxl Package:
– Read Excel: readxl::read_excel("file.xlsx")
• writexl Package:
– Write Excel: writexl::write_xlsx(data, "file.xlsx")
21
2.13.3 Reading and Writing Data with R’s Native Formats
R has its own native formats, preserving metadata like factors and attributes, making them
ideal for R-specific projects.
• RDS Files:
– Read RDS: readRDS("file.rds")
– Write RDS: saveRDS(data, "file.rds")
• RData Files: Store multiple R objects in one file.
– Load RData: load("file.RData")
– Save RData: save(data1, data2, file = "file.RData")
For larger datasets or those stored in databases, R offers packages to connect directly to
relational databases (e.g., MySQL, PostgreSQL) or non-relational databases.
• readr Package:
– Read from Web: readr::read_csv("http://example.com/data.csv")
• httr and jsonlite Packages**: For reading from APIs or JSON files.
– Read JSON: jsonlite::fromJSON("http://example.com/data.json")
– Write JSON: jsonlite::toJSON(data, "file.json")
22
• Feather and Parquet Files: Highly efficient, cross-language formats for big data.
– arrow Package: arrow::read_feather("file.feather") and arrow::read_parquet("file.parqu
These tools and packages make R highly flexible for data manipulation, allowing users to
handle data in various formats easily and efficiently.
For further readings: Chen (2024), James et al. (2021)
23
References
Bennett, Mark J. (Mark Joseph), and Dirk L. Hugen. 2016. Financial Analytics with R:
Building a Laptop Laboratory for Data Science. Cambridge, UK: Cambridge University
Press.
Chen, Jenny K. 2024. Financial Data Analytics with R: Monte-Carlo Validation. 1st ed. Boca
Raton: Chapman; Hall/CRC. https://doi.org/10.1201/9781003469704.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction
to Statistical Learning: With Applications in R. Springer Texts in Statistics. New York,
NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.
24