R Programming 1
R Programming 1
ain
aly
st’
s
Al
lYouNeed
T
oKnow Seri
es
ToBecomeaSuccessfulDataProfessi
onal
Pr
ogr
ammi
ng
ABOUT BRAINALYST
Brainalyst is a pioneering data-driven company dedicated to transforming data into actionable insights and
innovative solutions. Founded on the principles of leveraging cutting-edge technology and advanced analytics,
Brainalyst has become a beacon of excellence in the realms of data science, artificial intelligence, and machine
learning.
OUR MISSION
At Brainalyst, our mission is to empower businesses and individuals by providing comprehensive data solutions
that drive informed decision-making and foster innovation. We strive to bridge the gap between complex data and
meaningful insights, enabling our clients to navigate the digital landscape with confidence and clarity.
WHAT WE OFFER
• Data Strategy Development: Crafting customized data strategies aligned with your business
objectives.
• Advanced Analytics Solutions: Implementing predictive analytics, data mining, and statistical
analysis to uncover valuable insights.
• Business Intelligence: Developing intuitive dashboards and reports to visualize key metrics and
performance indicators.
• Machine Learning Models: Building and deploying ML models for classification, regression,
clustering, and more.
• Natural Language Processing: Implementing NLP techniques for text analysis, sentiment analysis,
and conversational AI.
• Computer Vision: Developing computer vision applications for image recognition, object detection,
and video analysis.
• Workshops and Seminars: Hands-on training sessions on the latest trends and technologies in
data science and AI.
• Customized Training Programs: Tailored training solutions to meet the specific needs of
organizations and individuals.
2021-2024
4. Generative AI Solutions
As a leader in the field of Generative AI, Brainalyst offers innovative solutions that create new content and
enhance creativity. Our services include:
• Content Generation: Developing AI models for generating text, images, and audio.
• Creative AI Tools: Building applications that support creative processes in writing, design, and
media production.
• Generative Design: Implementing AI-driven design tools for product development and
optimization.
OUR JOURNEY
Brainalyst’s journey began with a vision to revolutionize how data is utilized and understood. Founded by
Nitin Sharma, a visionary in the field of data science, Brainalyst has grown from a small startup into a renowned
company recognized for its expertise and innovation.
KEY MILESTONES:
• Inception: Brainalyst was founded with a mission to democratize access to advanced data analytics and AI
technologies.
• Expansion: Our team expanded to include experts in various domains of data science, leading to the
development of a diverse portfolio of services.
• Innovation: Brainalyst pioneered the integration of Generative AI into practical applications, setting new
standards in the industry.
• Recognition: We have been acknowledged for our contributions to the field, earning accolades and
partnerships with leading organizations.
Throughout our journey, we have remained committed to excellence, integrity, and customer satisfaction.
Our growth is a testament to the trust and support of our clients and the relentless dedication of our team.
Choosing Brainalyst means partnering with a company that is at the forefront of data-driven innovation. Our
strengths lie in:
• Expertise: A team of seasoned professionals with deep knowledge and experience in data science and AI.
• Customer Focus: A dedication to understanding and meeting the unique needs of each client.
• Results: Proven success in delivering impactful solutions that drive measurable outcomes.
JOIN US ON THIS JOURNEY TO HARNESS THE POWER OF DATA AND AI. WITH BRAINALYST, THE FUTURE IS
DATA-DRIVEN AND LIMITLESS.
2021-2024
TABLE OF CONTENTS
1. Preface 8. Variables in R 14. R Functions 18. Data Manipulation with
2. Introduction to R Programming • Components of R
Programming • Assigning Variables in Functions • Replacing/Recoding
• History of R R Programming • Types of Functions Values
Programming • Displaying Variable • Function Calls • Renaming Variables
• Evolution and
Values • Built-in Functions • Keeping and Dropping
Expansion
• Popularity and 9. Operators in R in R Variables
Community Programming 15. R Data Visualization • Subset Data
• Sorting
3. Getting Started with R • Arithmetic Operators • R Visualization • Value Labeling
• Installation of R and • Relational Operators Packages • Dealing with Missing
Environment Setup • Logical Operators • R Graphics Data
• The R Interface • Package Reference • Standard Graphics • Aggregate by Groups
• What is RStudio and Operator • Graphics Devices • Frequency for Vector
How Does It Improve 10. Data Types in R • Basics of the • Merging (Matching)
the R Experience? Grammar of
• Integer • Removing Duplicates
Graphics
4. Characteristics of R • Complex • Combining Columns
• Advantages and
Programming • Character and Rows
Disadvantages of
• Simplicity and • Factor
Data Visualization 19. Filtering Data in R
Effectiveness • Logical
in R • Filtering by a Single
• Data Analysis Focus • Date and Time
16. Creating Charts and Condition
• Robust Programming 11. Packages in R Graphs in R • Filtering Based on
Concepts Multiple Conditions
• What Constitutes a
• Integrated Toolset • R Pie Charts
Package? • Filtering by Row and
• Bar Charts in R
5. Basics of R • Utilizing Packages in R Column Position
• Boxplots in R
Programming: • Diverse Categories of • Selecting Non-Missing
• Histograms in R
Getting Started Libraries in R Data
• Line Graphs in R
• Your First R Equation 12. Importing and • Scatterplots in R 20. Linear Regression in R
• Comments in R Exporting Data 17. Data Exploration with R • Steps for Performing
• Assignment Operator in R Linear Regression
• Working Directory • Importing Data
• Importing Data into R • Creating a Relationship
• Data Structures in R • Calculate Basic
• Exporting Data from R Model
Descriptive
6. Data Structures in R • Using the Predict
13. Reserved Keywords in R Statistics
Programming Function
Programming • List Variable • Plotting the
• Atomic Vectors Names
• If Statement Regression
• Lists • Calculate the
• Else Statement
• Arrays Number of Rows
• Repeat Keyword
• Matrices and Columns
• While Keyword
• Data Frames • Display Dataset
• Function Keyword
• Factors Structure
• For Loop
7. Execution of the Code • Next Keyword • Viewing Rows
• Break Keyword • Selecting
• Console
• TRUE/FALSE Random Rows
• R Script
• NULL • Counting
• Execution of Code in R
• Inf and NaN Missing Values
Scripts
• NA
2021-2024
Preface
Welcome to the “Basic to Advanced R Programming” handbook, a comprehensive guide to
mastering R, one of the most powerful programming languages for statistical computing
and data analysis. Whether you’re new to R or looking to deepen your understanding, this
handbook is designed to assist you at every step of your journey.
In today’s data-driven world, the ability to analyze and visualize data effectively is paramount.
R provides a robust environment for statistical analysis, enabling users to transform raw data
into meaningful insights. This handbook covers everything from the basics of R programming
to advanced techniques, ensuring you have the knowledge to tackle complex data challenges.
Starting with the history and evolution of R, we’ll guide you through the installation process
and familiarize you with the R interface, including RStudio. As you progress, you’ll learn about
various data structures, execution of code, and the fundamentals of data manipulation and
visualization. This handbook emphasizes hands-on practice with detailed examples to help
you gain practical experience.
You’ll also explore advanced topics such as linear regression, filtering data, and data
exploration techniques. By the end of this handbook, you’ll be well-equipped to use R for
data analysis, statistical modeling, and creating sophisticated visualizations.
I am Nitin Sharma, CEO and Founder of Brainalyst – A Data-Driven Company. This handbook is the
result of extensive collaboration and dedication. I would like to acknowledge the unwavering support
from Brainalyst – A Data-Driven Company and its talented team. Their expertise and commitment
have been invaluable in bringing this comprehensive guide to life.
Join us on this journey to unlock the full potential of R programming. Let’s explore, analyze,
and visualize data together.
Nitin Sharma
Founder/CEO
Brainalyst- A Data Driven Company
Disclaimer: This material is protected under copyright act Brainalyst © 2021-2024. Unauthorized use and/ or
duplication of this material or any part of this material including data, in any form without explicit and written
permission from Brainalyst is strictly prohibited. Any violation of this copyright will attract legal actions.
2021-2024
BRAINALYST - R-PROGRAMMING
Birth of R (1990s):
• Development of R commenced in the early 1990s when ross ihaka and robert gentleman embarked on
a project to create an open-source alternative to the S programming language, which was a commercial
statistical software package developed at bell laboratories. primary goal was to provide researchers, stat-
isticians, and data analysts with a robust tool for data analysis and visualization.
• In the kdnuggets analytics software survey, R emerged as the top choice, garnering 49% of the votes. these
surveys affirm R extensive scope and importance in the field of data science.
Cost-effective solution:
• R offers a cost-effective solution for performing a wide spectrum of data analysis and data science tasks.
contrast, achieving similar functionality in SAS typically requires the purchase of a bundle of SAS soft-
ware and modules.
Companies utilizing R:
• Numerous top-tier companies and organizations rely on R for various data-driven endeavors. examples:
• Bain
• IT Companies Using R:
• Prominent IT and professional services companies both in India and globally incorporate R into their
operations. some of include:
• Accenture
• Amadeus IT Group
• Capgemini
• Cognizant
• CSC
• HCL Technologies
• Hexaware Technologies
• HP
• IBM
• IGATE
• Infosys
• Larsen & Toubro Infotech
• Microsoft
• Mindtree
• Mphasis
• NIIT Tech
• Oracle Financial Services Software
• Paytm
• Snapdeal
• R Systems Ltd
• Tata consultancy services
• Tech mahindra
• wipro
• Contributions to statistics and data analysis: R has played a pivotal role in advancing statistical re-
search and data analysis methodologies. researchers have employed R to analyze data from diverse
fields, contributing to scientific breakthroughs and business insights. Many state-of-the-art statistical
methods and machine learning algorithms are accessible through R packages, rendering them acces-
sible to a wide audience.
Installation:
• Run the downloaded installation file.
• Follow the installation instructions, generally accepting the default settings unless you have spe-
cific preferences.
• During the installation, you may be asked to select additional components or configure options;
make choices according to your needs.
R Window
• To access the R Editor, navigate to File >> New Script (Shortcut: CTRL + N). This is the space
where you can write your code or program. You can execute your code by pressing F5.
• R Console - This is where you can view the results or output of your code. While you can write
code in the R Console
Provides
• GUI
• Debugging options (helps in solving errors)
• Auto completion of codes
• Options for solving the codes, code history and object (facilitating in code reusability and sharing
of objects)
Access R studio
• Startà R studio
• RStudio is designed to simplify the life of R programmers. It’s an open-source tool available for
free, offering several premium features that enhance your programming experience. Unlike stan-
dard R, RStudio provides:
• Intelligent Code Completion: RStudio suggests code completions, making your coding more
efficient.
• Syntax Highlighting: It highlights syntax, helping you spot errors and structure your code better.
• Structured R Documentation: RStudio gives you easy access to documentation, making it sim-
ple to look up function descriptions and examples.
• Interactive Debugging Tools: Debugging in RStudio is more effective, allowing you to identify
and fix issues in your code with ease.
• In simple terms, RStudio acts as an interface enhancement for standard R. The programming pro-
cess is similar in both platforms.
Terminal
• The terminal provides a command-line interface to interact with your operating system. It’s use-
ful for executing system commands and managing files.
• Like the command prompt where certain hell command can be executed.
Environment
• All the object created in R (are available in the environment) are stored in RAM.
• Example: run just example code
• Class1=12345678
History
• The environment tab shows the variables and objects you’ve created during your R session. You
can view their values and manage the workspace.
• All the code that has executed in the current and previous session are saved in the windows.
Connection
• The connections tab allows you to connect to databases, APIs, or other data sources. It facilitates
data retrieval and analysis.
• New experiment in R
• It is being developed to connect R studio with other ODBC and spark data bases à big data.
Files
• Directory is the place where the files are automatically saved, and other files can be easily im-
ported.
• By default, the working directory for R is documents often indicated via the symbol.
• The Files pane provides access to your project’s directory structure. You can navigate files, folders,
and manage your project resources.
Plots
• The Plots pane displays graphical plots and visualizations generated by R. It helps you visualize
data and interpret results graphically.
• Where the graph are shown.
• Example- plot: object 7272 = 774
• Plot (20:2000)
Packages
• This is window which shows all the available installed packages/ libraries/ion R.
• We have two type of package- user libraries and system libraries.
• The Packages pane allows you to manage and install R packages. Packages contain functions,
datasets, and tools for specific tasks, enhancing R’s capabilities.
Help
• All the info for executing and understanding the different functions provides by various libraries.
• In the console we can use ?functionname and its documentation will appear in the window. Ex-
ample: ?class
• Do not use ?functionname
Viewer
• The Viewer pane displays dynamic HTML content generated by R, such as interactive visualiza-
tions or reports.
• Allows for the user to access the multimedia output.
• Buttons on the ribbon
• Note: check Important setting or check default can also be their
• Go to Tools option à click on global options.
Characteristics of R Programming
• R is a specialized programming language designed for data analysis, known for its distinctive features that
enhance its power and efficiency. Some of the notable characteristics of R programming include:
Integrated Toolset:
• R provides a cohesive and integrated set of tools tailored for data analysis tasks.
Open-Source Nature:
• It is an open-source platform, freely available for use and highly extensible through community
contributions.
Vectorised Operations:
• One of R’s standout features is its ability to perform multiple calculations using vectors, stream-
lining complex operations.
Interpreted Language:
• R is an interpreted language, allowing for interactive and dynamic data analysis.
• Assignment Operator: You can assign values using <- or = interchangeably. For example, x <- 10 or y =
5 both assign values to variables.
• Working Directory: Use getwd() to check the current working directory. You can change it using setwd().
• File Paths in R: R uses forward slashes (/) instead of backward slashes () for file paths (e.g., “C:/Users/
YourName/Documents”).
• Combining Values: The c() function is used to combine values into a vector.
• Code Formatting: In RStudio, press CTRL + SHIFT + A to format your code neatly.
• Handling Missing Values: R uses NA to represent missing values. To calculate sums excluding NA, use
na.rm = TRUE (by default, it is FALSE).
• Generating Sequences: The form 1:10 generates integers from 1 to 10.
• Case Sensitivity: R is case-sensitive, so use the correct case for variable and function names.
• Getting Help: To get help for a function, use the help() function (e.g., help(sum))
• Naming Conventions: Object names can include letters, numbers, underscores (_), or periods (.), but they
must start with a letter.
• Data Structures: R has various data structures like vectors, factors, data frames, matrices, arrays, and
lists. Data frames are similar to datasets in SAS.
• Importing CSV Files: Use read.csv() to import CSV files into R. For example: mydata <- read.csv(“c:/
mydatafile.csv”, header=TRUE).
• Opening Tables/Functions: Use the fix() function to open data frames or functions. For instance, fix-
(mydata) opens the ‘mydata’ data frame.
• Viewing Source Code: To view the source code of a function, use fix() with the function’s name (e.g.,
fix(colSums)).
• Retrieving Previous Commands: In R Console, you can retrieve previous commands with the UP arrow
key and edit them to rerun.
• Installing and Loading Packages: Install packages with install.packages(“package_name”) and load
them with library(“package_name”).
• Saving Data in CSV Format: Use write.csv() to save a data frame in CSV format. For instance, write.
csv(mydata, “file1.csv”).
• Saving Data in R Format: Save a data frame in R format using save.image(“mydata.RData”).
• Loading RData: To load an RData file, use load(“mydata.RData”).
• Using attach(): The attach() function is used to tell R which data set to use. For example, attach(mydata)
attaches the ‘mydata’ data frame.
Data Types in R:
• In programming languages, we utilize variables to store and manage various data. Variables act as desig-
nated memory locations where program information is stored. When we create a variable within our code,
it prompts the allocation of memory space.
• R programming supports multiple data types, including integers, strings, and more. The allocation of
memory by the operating system depends on the specific data type associated with the variable, dictating
the kind of data that can be stored within the reserved memory.
• In R programming, we encounter a range of data types, including:
Example:
Output:
Updated Illustration:
• c - numeric values (1, 2, 3, 4, 5)
• Text or character values are stored in character vectors, as explained.
• Updated Illustration: character values - c
• (Apple, Banana, Cherry)
Lists:
• Lists in R exhibit remarkable versatility by accommodating elements of divergent data types.
• Unlike atomic vectors, lists defy constraints by allowing mixed data types within.
• Lists are often referred to as generic vectors due to their capacity to house an assortment of R
objects.
• my_list <- list(Name = “John”, Age = 30, Married = TRUE)
Arrays:
• Arrays serve as repositories for data extending beyond two dimensions.
• These structures house data of uniform data types, and they feature contiguous memory allocation.
2021-2024 Pg. No.16
BRAINALYST - R-PROGRAMMING
Matrices:
• Matrices adopt a two-dimensional, rectilinear format for storing elements.
• Constituent elements within a matrix share a common atomic type, often numeric to facilitate
mathematical computations.
• Matrices are brought into existence through the utilization of the matrix() function.
• my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
Data Frames:
• Data frames emulate the semblance of a two-dimensional array, akin to a tabular representation.
• Each column serves as a repository for a specific variable, while rows aggregate corresponding
sets of values.
• Data frames are instrumental in data analysis, recognized by their distinctive row names and
non-empty column names.
Factors:
• Factors are specialized data entities employed for classification and the storage of data as discrete
levels.
• They possess the capability to encompass both character strings and integers.
• Factors prove especially valuable when confronted with columns harboring a finite number of
unique values.
• The factor() function is harnessed for the creation of factors within R.
1. Vectors:
• One-dimensional data structures in R are called vectors. The same data type elements may be
contained in both.
Quantitative(numeric) Vector:
• An explanation of how numerical vectors, such as integers or decimals, hold numerical values .
Updated Illustration:
• c - numeric values (1, 2, 3, 4, 5)
• Text or character values are stored in character vectors, as explained.
• Updated Illustration: character values - c
• (Apple, Banana, Cherry)
• R script
• It’s like page in the notebook
• Here we can give a name to the scripts, save it, reuse it and manipulate or edit the codes the
R scripts, we can shares shares the R scripts with others.
Example:
• # Calculate the sum of numbers from 1 to 10
• sum_result <- sum(1:10)
• print(sum_result)
• Note: Write code in R script because in console can’t save and edit the code.
• Example: if have 1000 lines of code then select code using shortcut.
Variable in R programming
• Within the realm of R programming, variables act as containers, serving the purpose of storing and ref-
erencing data that requires processing. These variables are versatile and capable of holding various types
of information, encompassing atomic vectors, collections of atomic vectors, or even combinations of
different R objects.
• Unlike statically typed languages such as C++, R adopts dynamic typing. In essence, this dynamic typing
implies that R discerns the data type of a variable at the moment when the corresponding statement is
executed. As for the nomenclature of variables in R, it adheres to certain rules: a valid variable name can
incorporate letters, numbers, dots, and underscores. However, it’s imperative to underscore that a variable
name should initiate with a letter or a dot that isn’t succeeded by a number.
Pg. No.19 2021-2024
BRAINALYST - R-PROGRAMMING
• Output:
Arithmetic operator
• / (division)
• * (multiplication)
• - (subtraction)
• %% (modulus)(remainder)
• %/% (integer division)
• Example: 10+20
• 23*23
• 30%3
• 50-45
• 100//23
• 234%%34
Relational operators
• Whenever we use a relational operator, we compare two objects, and the output is boolen.
• Boolean is also known as logic.
• Boolean is always either TRUE or FALSE
• > (greater than)
• < (less then)
• <= (greater than equal to)
• >= (less than equal to)
• == (equal to equal to)
• The difference between = and ==
• a=10
• b=30 (assignment operator where assigning or saving value of b inside a)
• a==b (comparing is value inside a and b is same or not)
• a=b
• a
Logical operator
• And &
• Or |
• Not |
• Here also the output is Boolean, we will use then during the data manipulation function.
• Example: l1=10<5
• L2= TRUE
• Here also the output is Boolean. We will use then during the data manipulation function
Pg. No.21 2021-2024
BRAINALYST - R-PROGRAMMING
Other operators
• There are other operators that might come across. That will update.
In R, we have multiple kinds of data types, but the major ones are as follows:
The data type related to number
1. Integer
2. Numeric
3. Short number with decimal
4. Long number with decimal
5. Short number without decimal
6. Long number without decimal
Integer
• It is found in special cases during the importing of certain files or when certain.
• Definition of small number is
• -2345667 to 2345566
• Only small number without decimal an have integer as the data type
• Example:
• N1=34
• N2=45
• N3=-87
• N4=32434
• Class(n1)
• Class(n2)
• Class(n3) -----find data type
Complex
• Real + imaginary
• For example, 5+6i
Packages in R
• Packages, also known as libraries, are a crucial component of the R programming environment.
They serve to expand the capabilities of base R by offering an extensive range of additional func-
tions, tools, and documentation. These packages are indispensable for a variety of tasks, encom-
passing data manipulation, data visualization, statistical analysis, and more. Essentially, packages
consist of pre-written code collections that enhance R’s utility, making it a potent language for data
analysis and computational tasks.
What Constitutes a Package?
• A package essentially comprises a collection of functions bundled together. These functions are
systematically organized within the package, often resembling a compressed archive or a regular
directory. In essence, you can conceptualize a package as a specialized toolbox tailored for specific
tasks. For instance, a package dedicated to financial data analysis might encompass functions asso-
ciated with account management, and it can conveniently be referred to as the “AccManipulation”
library.
Utilizing Packages in R
• To leverage the capabilities of packages in R, it involves several key steps:
Package Installation:
• Before making use of a package, you must install it. This is accomplished through the use of
the install.packages(“package_name”) function. For instance, to install the widely used “ggplot2”
package for data visualization, you can execute the following:
Package Loading:
• Once a package is successfully installed, it needs to be loaded into your current R session. This is
achieved by employing the library(package_name) function. For example, to load the “ggplot2”
package, you would employ the following:
System Libraries:
• System libraries are integral components that come pre-installed with R by default. They are me-
ticulously developed and maintained by the core developers of R. When you opt to install R (with
the base option), you inherently acquire these libraries. System libraries encompass “utils,” which
provides essential R functionalities, and “stats,” which facilitates statistical operations.
User Libraries:
• In contrast, user libraries are third-party libraries formulated and maintained by the vibrant R
community. These libraries are typically accessible for download and utilization from the Compre-
hensive R Archive Network (CRAN) repository. Functions encompassed within user libraries may
be implemented in diverse programming languages, including R (the native language), C++, Perl,
Java, Julia, and more. These libraries furnish an extensive array of specialized tools and functions
meticulously tailored to cater to a myriad of diverse needs.
CSV Files
• Method: To import data from a CSV file, use the read.csv () function.
• Syntax: read.csv (file, header = TRUE, sep = “,”, ...)
• Description: This function is specifically designed to read data from comma-separated values
(CSV) files. Here’s a breakdown of the parameters:
• file: Specifies the path to the CSV file.
• header: A logical value indicating whether the first row contains column names.
• sep: Defines the delimiter used in the CSV file (typically a comma).
• Example:
• Example:
• Example:
• Example:
• Example:
If Statement:
• The if statement is used to conditionally execute one or more statements based on a Boolean ex-
pression in R programming.
• Example:
• a <- 12
• if (a < 16) {
• cat (“I am lesser than 15”)
• }
• Output:
Else statement:
• The else statement is used in conjunction with the if statement. It specifies the code block to be
executed when the if condition is false.
• Example:
• Output:
Repeat Keyword:
• The repeat keyword is used to create a loop that iterates indefinitely. To exit the loop, the break
statement is used.
• Example:
• Output:
While Keyword:
• The while keyword is used to create a loop that executes a block of code as long as a specified
condition is true.
• Example:
• Output:
Function Keyword:
• The function keyword is used to define user-defined functions in R. Functions encapsulate a set of
statements that can be executed as a single unit.
• Example:
• Output:
for Loop:
• For keyword is used for looping or iterating over a sequence (e.g., vector, list).
• Example:
• Output:
Next Keyword:
• The next keyword is used to skip the current iteration of a loop without terminating it, proceeding to the
next iteration.
• Example:
• Output:
Break Keyword:
• The break keyword is used to terminate a loop prematurely if a specified condition is met.
• Example:
• Output:
TRUE/FALSE:
• TRUE and FALSE are keywords used to represent Boolean true and false values in R, respectively.
NULL:
• NULL represents the null object in R and is used for missing or undefined values.
• Example:
• Output:
NA:
• NA is a logical constant representing missing values in R, and it can be used with different atomic
vector types.
R Functions
• In the realm of R programming, functions serve as organized sets of statements designed to execute
specific tasks. While R boasts an array of pre-existing functions, users also have the liberty to craft their
own customized functions. Functions occupy a pivotal role in programming, fostering modularity and
reusability.
• Functions act as a shield against the twin perils of code repetition and complexity reduction. They effec-
tively dissect code into smaller, logically structured components. A well-crafted R function adheres to the
following principles:
• It is tailored to execute a precise task.
• It may accept arguments (input values) as needed.
• It encompasses a body where the task-specific code is housed.
• It may, optionally, yield one or more output values.
• In R, functions come to life through the function keyword. The syntax for defining an R function takes the
following shape:
• Functions stand as the cornerstone of R programming, facilitating code organization, reusability, and
manageability.
Components of Functions
• In the context of R functions, there exist four foundational elements that define their structure and behav-
ior:
• Function Name: This denotes the specific name assigned to the function. In R, functions are treated as
objects and are stored with their corresponding names.
• Arguments: Arguments function as placeholders within a function. In R, functions can possess optional
arguments, signifying that they may or may not necessitate input values. Default values can also be as-
signed to these arguments. When invoking a function, values are supplied to these arguments.
• Function Body: The function body encompasses a collection of statements that delineate the actions and
operations the function undertakes. It elucidates the logic and tasks executed by the function.
• Return Value: The return value constitutes the ultimate expression within the function body, which under-
goes evaluation and is subsequently returned as the output of the function.
Types of Functions
• Output:
• User-defined Function: R empowers users to craft their own customized functions to cater to specific
programming needs. Once formulated, these user-defined functions can be employed in a manner akin to
built-in functions.
Function Calls
• Functions can be called in various ways:
• With Arguments: Passing values to the function when calling it, supplying necessary
input.
• Without Arguments: Invoking a function without providing input values.
• With Argument Values: Supplying arguments in the same sequence as defined in the
function or using argument names.
• With Default Arguments: Functions can have default argument values, which are used if
no values are provided during the function call.
Built-in Functions in R
• Built-in functions, also referred to as predefined functions, are functions that come prepackaged with the
R programming language. R boasts an extensive repertoire of these functions, which cater to a wide range
of tasks and operations that users might want to perform. These built-in functions are conveniently cate-
gorized according to their specific functionalities.
Mathematical Functions
• Within the realm of mathematical operations, R offers a diverse array of built-in functions. These func-
tions are tailored to assist with mathematical calculations and manipulations, enhancing the numerical
capabilities of the language.
Character Functions
• Character functions in R serve as indispensable tools for managing and manipulating character data.
These functions empower users to handle and transform text-based information effectively.
R Data Visualization
• Creating compelling data visualizations in R is both accessible and powerful, thanks to R’s inherent ca-
pabilities. Data visualization is a valuable method for extracting meaningful insights from data by repre-
senting it visually. Visualizations bring to light complex data patterns that might otherwise go unnoticed.
• Utilizing data visualization techniques makes it easier to explore and comprehend large datasets, simpli-
fying the process of uncovering essential insights.
R Visualization Packages
• R provides a wealth of packages designed specifically for data visualization tasks. These packages include
a wide range of tools and libraries:
1) Plotly
• Description: The plotly package offers a platform for creating high-quality, interactive online graphs.
It builds upon the JavaScript library “plotly.js” to provide users with visually appealing and interac-
tive charting capabilities.
• Syntax: To create a plotly chart, you can use the plot_ly() function. For example:
• When to Use: Use plotly when you need to generate web-based interactive graphs with features like
zooming, hovering tooltips, and more.
2) ggplot2
• Description: ggplot2 is a widely used R package for creating elegant and high-quality declarative
graphics. It stands out for its ability to generate aesthetically pleasing and customizable graphs.
• Syntax: To create a ggplot2 chart, you use the ggplot() function and add layers with the + operator.
For example:
• When to Use: ggplot2 is ideal for creating static, publication-quality plots and customizing
visuals to suit specific requirements.
3) Tidyquant
• Description: tidyquant is a financial package within the tidyverse ecosystem. It’s designed for im-
porting, analyzing, and visualizing financial data, making it a valuable tool for quantitative financial
analysis.
• Syntax: tidyquant offers a range of functions for financial data analysis and visualization, often used
in combination with other tidyverse packages like dplyr and ggplot2.
• When to Use: Use tidyquant when working with financial data, performing stock market anal-
ysis, or creating financial visualizations.
4) taucharts:
• Description: taucharts is a library that focuses on data visualization, allowing users to map data fields
to visual properties declaratively. It provides a flexible platform for exploring data visually.
• Syntax: You can create taucharts by specifying data mappings and visual properties using R func-
tions.
• When to Use: taucharts is suitable for exploratory data visualization and when you want to
create custom, interactive charts.
5) ggiraph
• Description: ggiraph is a tool for creating dynamic ggplot2 graphs. It enables the addition of tooltips,
JavaScript actions, and animations to ggplot2 graphics, enhancing their interactivity.
• Syntax: You start with a ggplot2 object and use ggiraph functions to make it interactive. For instance,
you can add tooltips using the geom_tooltip() function.
• When to Use: ggiraph is great when you need to make your ggplot2 graphs interactive and
informative.
6) geofacets
• Description: The geofacets package extends ‘ggplot2’ to provide geofaceting functionality. Geofacet-
ing allows arranging plots for different geographical entities into a grid, preserving geographical
orientation.
• Syntax: You use ggplot2 syntax with additional geofacets functions like geom_map() to create geo-
faceted plots.
• When to Use: geofacets are valuable when you want to create grid-based geographic visualiza-
tions with ggplot2.
7) googleVis
• Description: googleVis facilitates creating interactive web pages with Google’s chart tools based on
R data frames. It acts as a bridge between R and Google’s charting capabilities.
• Syntax: You can create various Google charts using functions like gvisPieChart() or gvisLineChart().
• When to Use: googleVis is suitable for generating web-based interactive charts, especially
when integrating R with web applications.
8) RColorBrewer
• Description: RColorBrewer provides a collection of color schemes designed by Cynthia Brewer. It’s
particularly useful for selecting color palettes for maps and other visualizations.
• Syntax: You can set color palettes using functions like brewer.pal() and apply them to your plots.
• When to Use: RColorBrewer is handy when you want aesthetically pleasing and distinguish-
able color schemes for your visualizations.
9) dygraphs
• Description: dygraphs is an R interface to the dygraphs JavaScript charting library. It specializes in
charting time-series data with interactive features.
• Syntax: You use the dygraph() function to create interactive time-series plots.
• When to Use: dygraphs are ideal for visualizing and exploring time-series data with interactive fea-
tures.
10) shiny
• Description: R’s shiny package enables the development of interactive web applications with ease.
It offers extensions, HTML widgets, CSS, and JavaScript to create aesthetically pleasing web apps.
• Syntax: shiny uses a reactive framework, and you define the app’s components within a shiny app
object.
• When to Use: shiny is the go-to choose for building interactive web applications that utilize R’s data
visualization capabilities.
R Graphics
• Graphics serve a crucial role in highlighting key aspects of data. They enable us to explore the distribution
of individual variables, uncover connections between different variables, and provide concise summaries,
particularly when dealing with extensive datasets. Graphics serve as a valuable addition to various statis-
tical and computational methods, enhancing our ability to analyze data effectively.
Standard Graphics
• Standard graphics in R are made accessible through the graphics package, and they encompass a variety
of functions designed for generating statistical plots. These plots include commonly used visualizations
such as scatterplots, pie charts, boxplots, barplots, and more. What makes them convenient is that you can
create these graphs with just a single function call.
Graphics Devices
• A graphics device is essentially the canvas where your plots or visuals appear. It can be your computer
screen (a screen device), a PDF file (a file device), a Scalable Vector Graphics (SVG) file (another file
device), or even a PNG or JPEG image file (yes, also a file device).
• Here are some important points to grasp about graphics devices:
• The functions related to graphics devices produce output specific to the active graphics device.
• The default and most commonly used device is the screen device, where you see your plots.
• R offers various graphics devices like the PDF device, the JPEG device, etc., each suited for different
output formats.
• You simply need to specify the graphics output device you want to use, and R takes care of producing the
correct output for that device.
• You can have multiple devices open at once, but only one can be active at a time.
• Cost: Developing data visualization applications can be expensive, especially for small businesses.
Employing professionals to create complex visuals may increase costs.
• Distraction: Overly complex or flashy visuals can sometimes detract from the actual insights. It’s
important to maintain a balance between aesthetics and functionality.
R Pie Charts
• In R, you can create pie charts to visualize data using the pie () function. Pie charts represent data as slices
of a circle, each labeled and colored differently. Although pie charts are not the preferred choice in R due
to their limitations, you can still create them for basic visualization. Here’s how to do it:
• Example:
Bar Charts in R
• A bar chart is a graphical representation that uses bars or rectangles of equal width to display the numerical
values of categorical data. Each bar’s length or height is proportional to the value it represents. Bar charts
are a useful way to summarize and visualize categorical data in R.
• To create bar charts in R, we can use the barplot() function, which has the following syntax:
• Names.arg: You can provide labels for each bar on the x-axis using this parameter. It corresponds to the
categories or variables you are summarizing.
• Main: This parameter allows you to add a title or main heading to your bar chart.
• Col: You can specify the color of the bars using this parameter.
• Example:
Boxplots in R
• Boxplots are a valuable tool for understanding the distribution of data within a dataset. divide the dataset
into three quartiles, allowing you to visualize key statistics such as the minimum, maximum, average, first
quartile, and third quartile. boxplots are particularly useful for comparing the distribution of data across
multiple datasets.
• You can create boxplots using the boxplot () function which follows this syntax:
R histograms
• Histogram is a graphical representation that displays the frequency distribution of numerical values within
a dataset. It’s particularly useful for understanding how data is distributed across various value ranges. In
a histogram the data is divided into intervals or bins, and each bar represents the number of values falling
within a specific range.
• Unlike bar charts which are used to compare different categories or entities, histograms focus on visualiz-
ing the distribution of a single data set.
• You can create histograms using the hist () function, which takes a vector as input and offers several pa-
rameters for customization. List of syntax of the hist () function:
• Cool and border: Customize the color of the bars and their borders with these parameters.
R Line Graphs
• A line graph, also known as a line chart, is a visual representation of data that changes continuously over
time. In a line graph, data points are connected with lines to illustrate the continuous progression of values.
These lines can move both upward and downward, reflecting changes in the data.
• Line graphs are valuable for comparing different events, situations, or trends over time. They are particu-
larly effective at showing patterns and trends in data.
• To create a line graph in R, you can use the plot () function, which has the following syntax:
R Scatterplots
• Scatterplots are invaluable for comparing variables and understanding the relationships between them.
They provide a visual representation of how one variable is influenced by another. In a scatterplot, data is
presented as a collection of individual points, with each point denoting the values of two variables. One
variable is typically plotted on the vertical axis, while the other is plotted on the horizontal axis.
• In R, you have two primary methods for creating scatterplots: using the plot () function or utilizing func-
tions from the ggplot2 package.
• Here is the basic syntax for creating a scatterplot in R:
Importing Data:
• First, let’s import our dataset into R. We can use the read.csv () function to import a CSV file,
specifying that the header is included in the data:
• You can also calculate the summary statistics for a specific column, either by index or by name:
• This function provides information about the data frame’s structure, including the number of ob-
servations (rows) and variables (columns).
5. Viewing Rows:
• You can view the first few rows of your dataset using the head () function:
• You can specify the number of rows to display by providing an argument to head (). For example,
to display the first 5 rows:
• You can specify the number of rows to display with tail () as well:
1. Replacing/Recoding Values
• Replacing Values: You can update values within a dataset. For example, let’s replace all 1s with 6s
in the “Q1” variable:
• Recoding a Range: You can recode a range of values. For instance, transform 1 through 4 to 0 and
5 through 6 to 1:
2. Renaming Variables
• To rename variables in R, utilize the rename () function from the “dplyr” package:
• Selecting Rows with Missing Values: Choose rows containing missing values using the is.na ()
function:
5. Sorting
• Sort vectors or data frames with the sort () function:
• Sorting a Vector: Sort a vector, e.g., in descending order:
• Sorting a Data Frame: Sort a data frame by a specific column, e.g., sort “Gender” in ascending
order and “SAT” in descending order:
6. Value Labeling
• Use the factor () function for nominal data and ordered () for ordinal data to label values.
• Removing Rows with Missing Values: Create a new dataset without missing data:
8. Aggregate by Groups
• Aggregate data by grouping variables. Calculate, for example, the mean of variable “x” by
grouped variable “y”:
1. Using dplyr
• Filtering by a single condition
• Let’s say we want to select flight details for JetBlue airways originating from JFK airport.
OR condition:
• Select flight details for JetBlue Airways (carrier code B6) OR flights with JFK origin.
Multiple OR Conditions:
• Select flight details for JetBlue Airways with carrier codes B6, US, or AA.
NOT IN Condition:
• Remove flights with carrier codes B6, US, or AA.
Linear Regression
• Linear regression is a statistical method used to make predictions based on the relationship between a
response variable (often denoted as “y”) and one or more predictor variables (usually denoted as “x”).
Essentially, it helps us establish a linear connection between predictor and response variables.
• Mathematically, this relationship is expressed by the equation:
• y=ax+b
• Let’s break down what each part represents:
• y: The variable we want to predict, also known as the response variable.
• x: The variable(s) we use to make predictions, often referred to as predictor variables.
• a and b: These are constants known as coefficients.
• Output:
• This summary provides insights into coefficients, residuals, and model goodness of fit.
• Output: