0% found this document useful (0 votes)
3 views

R Programming 1

Brainalyst is a data-driven company focused on transforming data into actionable insights through advanced analytics, AI, and machine learning. Their mission is to empower clients with comprehensive data solutions, offering services in data analytics, AI, training, and generative AI. Founded by Nitin Sharma, Brainalyst has grown significantly since its inception, becoming a leader in the field and emphasizing continuous learning and innovation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

R Programming 1

Brainalyst is a data-driven company focused on transforming data into actionable insights through advanced analytics, AI, and machine learning. Their mission is to empower clients with comprehensive data solutions, offering services in data analytics, AI, training, and generative AI. Founded by Nitin Sharma, Brainalyst has grown significantly since its inception, becoming a leader in the field and emphasizing continuous learning and innovation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Br

ain
aly
st’
s

Al
lYouNeed
T
oKnow Seri
es
ToBecomeaSuccessfulDataProfessi
onal

Pr
ogr
ammi
ng
ABOUT BRAINALYST

Brainalyst is a pioneering data-driven company dedicated to transforming data into actionable insights and
innovative solutions. Founded on the principles of leveraging cutting-edge technology and advanced analytics,
Brainalyst has become a beacon of excellence in the realms of data science, artificial intelligence, and machine
learning.

OUR MISSION

At Brainalyst, our mission is to empower businesses and individuals by providing comprehensive data solutions
that drive informed decision-making and foster innovation. We strive to bridge the gap between complex data and
meaningful insights, enabling our clients to navigate the digital landscape with confidence and clarity.

WHAT WE OFFER

1. Data Analytics and Consulting


Brainalyst offers a suite of data analytics services designed to help organizations harness the power of their
data. Our consulting services include:

• Data Strategy Development: Crafting customized data strategies aligned with your business
objectives.

• Advanced Analytics Solutions: Implementing predictive analytics, data mining, and statistical
analysis to uncover valuable insights.

• Business Intelligence: Developing intuitive dashboards and reports to visualize key metrics and
performance indicators.

2. Artificial Intelligence and Machine Learning


We specialize in deploying AI and ML solutions that enhance operational efficiency and drive innovation.
Our offerings include:

• Machine Learning Models: Building and deploying ML models for classification, regression,
clustering, and more.

• Natural Language Processing: Implementing NLP techniques for text analysis, sentiment analysis,
and conversational AI.

• Computer Vision: Developing computer vision applications for image recognition, object detection,
and video analysis.

3. Training and Development


Brainalyst is committed to fostering a culture of continuous learning and professional growth. We provide:

• Workshops and Seminars: Hands-on training sessions on the latest trends and technologies in
data science and AI.

• Online Courses: Comprehensive courses covering fundamental to advanced topics in data


analytics, machine learning, and AI.

• Customized Training Programs: Tailored training solutions to meet the specific needs of
organizations and individuals.

2021-2024
4. Generative AI Solutions

As a leader in the field of Generative AI, Brainalyst offers innovative solutions that create new content and
enhance creativity. Our services include:

• Content Generation: Developing AI models for generating text, images, and audio.

• Creative AI Tools: Building applications that support creative processes in writing, design, and
media production.

• Generative Design: Implementing AI-driven design tools for product development and
optimization.

OUR JOURNEY

Brainalyst’s journey began with a vision to revolutionize how data is utilized and understood. Founded by
Nitin Sharma, a visionary in the field of data science, Brainalyst has grown from a small startup into a renowned
company recognized for its expertise and innovation.

KEY MILESTONES:

• Inception: Brainalyst was founded with a mission to democratize access to advanced data analytics and AI
technologies.

• Expansion: Our team expanded to include experts in various domains of data science, leading to the
development of a diverse portfolio of services.

• Innovation: Brainalyst pioneered the integration of Generative AI into practical applications, setting new
standards in the industry.

• Recognition: We have been acknowledged for our contributions to the field, earning accolades and
partnerships with leading organizations.

Throughout our journey, we have remained committed to excellence, integrity, and customer satisfaction.
Our growth is a testament to the trust and support of our clients and the relentless dedication of our team.

WHY CHOOSE BRAINALYST?

Choosing Brainalyst means partnering with a company that is at the forefront of data-driven innovation. Our
strengths lie in:

• Expertise: A team of seasoned professionals with deep knowledge and experience in data science and AI.

• Innovation: A commitment to exploring and implementing the latest advancements in technology.

• Customer Focus: A dedication to understanding and meeting the unique needs of each client.

• Results: Proven success in delivering impactful solutions that drive measurable outcomes.

JOIN US ON THIS JOURNEY TO HARNESS THE POWER OF DATA AND AI. WITH BRAINALYST, THE FUTURE IS
DATA-DRIVEN AND LIMITLESS.

2021-2024
TABLE OF CONTENTS
1. Preface 8. Variables in R 14. R Functions 18. Data Manipulation with
2. Introduction to R Programming • Components of R
Programming • Assigning Variables in Functions • Replacing/Recoding
• History of R R Programming • Types of Functions Values
Programming • Displaying Variable • Function Calls • Renaming Variables
• Evolution and
Values • Built-in Functions • Keeping and Dropping
Expansion
• Popularity and 9. Operators in R in R Variables
Community Programming 15. R Data Visualization • Subset Data
• Sorting
3. Getting Started with R • Arithmetic Operators • R Visualization • Value Labeling
• Installation of R and • Relational Operators Packages • Dealing with Missing
Environment Setup • Logical Operators • R Graphics Data
• The R Interface • Package Reference • Standard Graphics • Aggregate by Groups
• What is RStudio and Operator • Graphics Devices • Frequency for Vector
How Does It Improve 10. Data Types in R • Basics of the • Merging (Matching)
the R Experience? Grammar of
• Integer • Removing Duplicates
Graphics
4. Characteristics of R • Complex • Combining Columns
• Advantages and
Programming • Character and Rows
Disadvantages of
• Simplicity and • Factor
Data Visualization 19. Filtering Data in R
Effectiveness • Logical
in R • Filtering by a Single
• Data Analysis Focus • Date and Time
16. Creating Charts and Condition
• Robust Programming 11. Packages in R Graphs in R • Filtering Based on
Concepts Multiple Conditions
• What Constitutes a
• Integrated Toolset • R Pie Charts
Package? • Filtering by Row and
• Bar Charts in R
5. Basics of R • Utilizing Packages in R Column Position
• Boxplots in R
Programming: • Diverse Categories of • Selecting Non-Missing
• Histograms in R
Getting Started Libraries in R Data
• Line Graphs in R
• Your First R Equation 12. Importing and • Scatterplots in R 20. Linear Regression in R
• Comments in R Exporting Data 17. Data Exploration with R • Steps for Performing
• Assignment Operator in R Linear Regression
• Working Directory • Importing Data
• Importing Data into R • Creating a Relationship
• Data Structures in R • Calculate Basic
• Exporting Data from R Model
Descriptive
6. Data Structures in R • Using the Predict
13. Reserved Keywords in R Statistics
Programming Function
Programming • List Variable • Plotting the
• Atomic Vectors Names
• If Statement Regression
• Lists • Calculate the
• Else Statement
• Arrays Number of Rows
• Repeat Keyword
• Matrices and Columns
• While Keyword
• Data Frames • Display Dataset
• Function Keyword
• Factors Structure
• For Loop
7. Execution of the Code • Next Keyword • Viewing Rows
• Break Keyword • Selecting
• Console
• TRUE/FALSE Random Rows
• R Script
• NULL • Counting
• Execution of Code in R
• Inf and NaN Missing Values
Scripts
• NA

2021-2024
Preface
Welcome to the “Basic to Advanced R Programming” handbook, a comprehensive guide to
mastering R, one of the most powerful programming languages for statistical computing
and data analysis. Whether you’re new to R or looking to deepen your understanding, this
handbook is designed to assist you at every step of your journey.
In today’s data-driven world, the ability to analyze and visualize data effectively is paramount.
R provides a robust environment for statistical analysis, enabling users to transform raw data
into meaningful insights. This handbook covers everything from the basics of R programming
to advanced techniques, ensuring you have the knowledge to tackle complex data challenges.
Starting with the history and evolution of R, we’ll guide you through the installation process
and familiarize you with the R interface, including RStudio. As you progress, you’ll learn about
various data structures, execution of code, and the fundamentals of data manipulation and
visualization. This handbook emphasizes hands-on practice with detailed examples to help
you gain practical experience.
You’ll also explore advanced topics such as linear regression, filtering data, and data
exploration techniques. By the end of this handbook, you’ll be well-equipped to use R for
data analysis, statistical modeling, and creating sophisticated visualizations.
I am Nitin Sharma, CEO and Founder of Brainalyst – A Data-Driven Company. This handbook is the
result of extensive collaboration and dedication. I would like to acknowledge the unwavering support
from Brainalyst – A Data-Driven Company and its talented team. Their expertise and commitment
have been invaluable in bringing this comprehensive guide to life.

Join us on this journey to unlock the full potential of R programming. Let’s explore, analyze,
and visualize data together.

Nitin Sharma
Founder/CEO
Brainalyst- A Data Driven Company

Disclaimer: This material is protected under copyright act Brainalyst © 2021-2024. Unauthorized use and/ or
duplication of this material or any part of this material including data, in any form without explicit and written
permission from Brainalyst is strictly prohibited. Any violation of this copyright will attract legal actions.

2021-2024
BRAINALYST - R-PROGRAMMING

Basic to Advanced R Programming


History
• R programming often referred to as simply ‘R,’ is a powerful open-source language and environment
designed for statistical computing and data analysis. It originated in the early 1990s when Ross Ihaka and
Robert gentleman initiated its development at the university of auckland, new zealand. The following is
an abbreviated history of R programming:

Birth of R (1990s):
• Development of R commenced in the early 1990s when ross ihaka and robert gentleman embarked on
a project to create an open-source alternative to the S programming language, which was a commercial
statistical software package developed at bell laboratories. primary goal was to provide researchers, stat-
isticians, and data analysts with a robust tool for data analysis and visualization.

Release as open source (1995):


• In 1995, R was officially released as free software under the GNU general public license (GPL). this open-
source approach fostered rapid growth in the R community and significantly contributed to its widespread
adoption. R project also established CRAN (comprehensive R archive network), a central repository for
user-contributed packages and extensions.

Evolution and expansion (2000s):


• Throughout the 2000s, R experienced substantial development and expansion. R community expanded,
giving rise to a vast ecosystem of packages and libraries designed for various statistical and data analysis
tasks. R flexibility and extensibility made it a preferred choice in academia, research, and industry for data
exploration, visualization, and statistical modeling.

R in data science (2010s):


• Data science gained prominence in the 2010s, R emerged as a fundamental tool for data scientists and
statisticians. Its extensive package ecosystem, particularly for machine learning and data visualization,
solidified its position as a top choice for data analysis. R’s compatibility with other programming languag-
es like python and its capability to handle large datasets using packages such as data. table and sparklyr
further cemented its role in the data science field.

Popularity and community (present):


• Presently, R remains widely utilized across various domains, including academia, healthcare, finance, and
technology. it boasts a large and active community of users and contributors. R is renowned for its abun-
dant documentation, tutorials, and online forums where users can seek assistance and share knowledge.
• R is a widely recognized and powerful programming language utilized for data analysis, statistical mod-
eling, and data visualization. its growing popularity has solidified its position as a prominent tool in the
realm of predictive modeling. several surveys and industry reports underscore the significance of R:

Surveys and rankings:


• 2016 data science salary survey conducted by Oreilly r secured the second position among programming
languages for data science with SQL ranking first.

• In the kdnuggets analytics software survey, R emerged as the top choice, garnering 49% of the votes. these
surveys affirm R extensive scope and importance in the field of data science.

Pg. No.1 2021-2024


BRAINALYST - R-PROGRAMMING

Rich package ecosystem:


• R boasts an extensive repository of nearly 10,000 free packages, augmenting its capabilities for data
science and analysis. continuous growth in the number of packages reflects the increasing interest and
engagement of the R community.

Cost-effective solution:
• R offers a cost-effective solution for performing a wide spectrum of data analysis and data science tasks.
contrast, achieving similar functionality in SAS typically requires the purchase of a bundle of SAS soft-
ware and modules.

Highly paid skill:


• R is recognized as one of the most lucrative IT skills and holds a substantial share in the advanced analyt-
ics software market.

Integration and industry adoption:


• R seamlessly integrates with popular software tools like Tableau and SQL Server. Microsoft’s acquisition
of Revolution Analytics resulted in the integration of R with SQL Server, Visual Studio, and PowerBI.

Rapid algorithm development:


• R enjoys a reputation for the swift implementation of new statistical and machine learning algorithms
compared to other statistical tools. this quality makes it the preferred choice for researchers and data sci-
entists.

Companies utilizing R:
• Numerous top-tier companies and organizations rely on R for various data-driven endeavors. examples:

Top tier companies using R:


• Facebook: Utilizes R for behavior analysis related to status updates and profile pictures.
• Google: leverages R for assessing advertising effectiveness and economic forecasting.
• Twitter: employs R for data visualization and semantic clustering.
• Microsoft: following the acquisition of Revolution Analytics, Microsoft utilizes R for a variety of
purposes.
• Uber: utilizes R for statistical analysis.
• Airbnb: scales data science operations using R.
• IBM: actively participates in the R consortium group.
• ANZ: applies R for credit risk modeling.
• HP
• Ford
• Novartis
• Roche
• New York Times: utilizes R for data visualization.
• McKinsey
• BCG

2021-2024 Pg. No.2


BRAINALYST - R-PROGRAMMING

• Bain
• IT Companies Using R:
• Prominent IT and professional services companies both in India and globally incorporate R into their
operations. some of include:
• Accenture
• Amadeus IT Group
• Capgemini
• Cognizant
• CSC
• HCL Technologies
• Hexaware Technologies
• HP
• IBM
• IGATE
• Infosys
• Larsen & Toubro Infotech
• Microsoft
• Mindtree
• Mphasis
• NIIT Tech
• Oracle Financial Services Software
• Paytm
• Snapdeal
• R Systems Ltd
• Tata consultancy services
• Tech mahindra
• wipro
• Contributions to statistics and data analysis: R has played a pivotal role in advancing statistical re-
search and data analysis methodologies. researchers have employed R to analyze data from diverse
fields, contributing to scientific breakthroughs and business insights. Many state-of-the-art statistical
methods and machine learning algorithms are accessible through R packages, rendering them acces-
sible to a wide audience.

Installation of R and environment setup:


R is a freely available language and environment designed for statistical computing and graphics. it offers a wide
range of capabilities, including:
• Exploring and manipulating data
• Building and validating predictive models
• Applying machine learning and text mining algorithms
• Creating visually appealing graphs
• Connecting with databases
• Building online dynamic reports or dashboards
• Sending emails or push notifications via R

Pg. No.3 2021-2024


BRAINALYST - R-PROGRAMMING

• how to download and install R:


Download R:
• Visit the official R website: R Project.
• Click on the ‘CRAN’ link located on the left-hand side of the page.
• Choose your country from the list and click on the link provided for your location.
• On the next page, click ‘download R for Windows’ (if you’re using Windows) or choose the appro-
priate option for your operating system.

Installation:
• Run the downloaded installation file.
• Follow the installation instructions, generally accepting the default settings unless you have spe-
cific preferences.
• During the installation, you may be asked to select additional components or configure options;
make choices according to your needs.

R Window
• To access the R Editor, navigate to File >> New Script (Shortcut: CTRL + N). This is the space
where you can write your code or program. You can execute your code by pressing F5.
• R Console - This is where you can view the results or output of your code. While you can write
code in the R Console

2021-2024 Pg. No.4


BRAINALYST - R-PROGRAMMING

The R Interface: A New User’s Perspective


• For many newcomers to R, the interface can be off-putting. It may appear less visually appealing and us-
er-friendly compared to software like SAS or SPSS. Some find it challenging to write and manage code
within this environment.

Feeling Bored with the R Interface?


• If you’re looking to make your R experience more engaging and user-friendly, consider trying out RStu-
dio.

What Is RStudio, and How Does It Improve the R Experience?


How to download R studio?
R will work for backend and R studio will work for frontend
• Google à install R studioà https://posit.co/download/rstudio-desktop/
• Install free version
• Independent IDE and is not maintained by CRAN

Provides
• GUI
• Debugging options (helps in solving errors)
• Auto completion of codes
• Options for solving the codes, code history and object (facilitating in code reusability and sharing
of objects)
Access R studio
• Startà R studio

Pg. No.5 2021-2024


BRAINALYST - R-PROGRAMMING

• RStudio is designed to simplify the life of R programmers. It’s an open-source tool available for
free, offering several premium features that enhance your programming experience. Unlike stan-
dard R, RStudio provides:
• Intelligent Code Completion: RStudio suggests code completions, making your coding more
efficient.
• Syntax Highlighting: It highlights syntax, helping you spot errors and structure your code better.
• Structured R Documentation: RStudio gives you easy access to documentation, making it sim-
ple to look up function descriptions and examples.
• Interactive Debugging Tools: Debugging in RStudio is more effective, allowing you to identify
and fix issues in your code with ease.
• In simple terms, RStudio acts as an interface enhancement for standard R. The programming pro-
cess is similar in both platforms.

Here Are Some Handy RStudio Shortcuts:


• Press CTRL + Enter to execute your code.
• Use CTRL + SHIFT + C to quickly comment or uncomment lines of code.
• To create a new R script, press CTRL + SHIFT + N.

The R studio layout(component)


Console
• The brain of R studio
• Here we can write the code and simultaneously see the output
• Any code which is written in the console cannot be saved using R studio saving mechanism.
• All the codes are executed in the console.

2021-2024 Pg. No.6


BRAINALYST - R-PROGRAMMING

Terminal
• The terminal provides a command-line interface to interact with your operating system. It’s use-
ful for executing system commands and managing files.
• Like the command prompt where certain hell command can be executed.

Environment
• All the object created in R (are available in the environment) are stored in RAM.
• Example: run just example code
• Class1=12345678

Pg. No.7 2021-2024


BRAINALYST - R-PROGRAMMING

History
• The environment tab shows the variables and objects you’ve created during your R session. You
can view their values and manage the workspace.
• All the code that has executed in the current and previous session are saved in the windows.

Connection
• The connections tab allows you to connect to databases, APIs, or other data sources. It facilitates
data retrieval and analysis.
• New experiment in R
• It is being developed to connect R studio with other ODBC and spark data bases à big data.

2021-2024 Pg. No.8


BRAINALYST - R-PROGRAMMING

Files
• Directory is the place where the files are automatically saved, and other files can be easily im-
ported.
• By default, the working directory for R is documents often indicated via the symbol.
• The Files pane provides access to your project’s directory structure. You can navigate files, folders,
and manage your project resources.

Plots
• The Plots pane displays graphical plots and visualizations generated by R. It helps you visualize
data and interpret results graphically.
• Where the graph are shown.
• Example- plot: object 7272 = 774
• Plot (20:2000)

Packages
• This is window which shows all the available installed packages/ libraries/ion R.
• We have two type of package- user libraries and system libraries.
• The Packages pane allows you to manage and install R packages. Packages contain functions,
datasets, and tools for specific tasks, enhancing R’s capabilities.
Help
• All the info for executing and understanding the different functions provides by various libraries.
• In the console we can use ?functionname and its documentation will appear in the window. Ex-
ample: ?class
• Do not use ?functionname
Viewer
• The Viewer pane displays dynamic HTML content generated by R, such as interactive visualiza-
tions or reports.
• Allows for the user to access the multimedia output.
• Buttons on the ribbon
• Note: check Important setting or check default can also be their
• Go to Tools option à click on global options.

Pg. No.9 2021-2024


BRAINALYST - R-PROGRAMMING

Click on code option: Editing: check setting

Click on display: check setting or go through videos

2021-2024 Pg. No.10


BRAINALYST - R-PROGRAMMING

Characteristics of R Programming
• R is a specialized programming language designed for data analysis, known for its distinctive features that
enhance its power and efficiency. Some of the notable characteristics of R programming include:

Simplicity and Effectiveness:


• R is recognized for its simplicity and effectiveness in handling data analysis tasks.

Data Analysis Focus:


• R is primarily oriented toward data analysis and statistical computing.

Robust Programming Concepts:


• It incorporates robust programming concepts such as user-defined functions, loops, conditional
statements, and various input/output capabilities.

Integrated Toolset:
• R provides a cohesive and integrated set of tools tailored for data analysis tasks.

Array and Vector Operations:


• R offers a comprehensive suite of operators for performing diverse calculations on arrays, lists,
and vectors.

Efficient Data Handling:


• R includes efficient data handling and storage capabilities, making it suitable for working with
datasets of varying sizes.

Pg. No.11 2021-2024


BRAINALYST - R-PROGRAMMING

Open-Source Nature:
• It is an open-source platform, freely available for use and highly extensible through community
contributions.

Extensive Graphical Techniques:


• R boasts a wide range of extensible graphical techniques, aiding in data visualization and interpre-
tation.

Vectorised Operations:
• One of R’s standout features is its ability to perform multiple calculations using vectors, stream-
lining complex operations.

Interpreted Language:
• R is an interpreted language, allowing for interactive and dynamic data analysis.

Basics of R Programming: Getting Started


• Let’s dive into the fundamental aspects of R programming to kickstart your journey:
• Your First R Equation: To perform basic calculations in R, enter expressions like 5*3 in the RStudio code
editor and press CTRL + ENTER (or F5 in standard R). The result is preceded by [1], indicating it’s the
first result.

2021-2024 Pg. No.12


BRAINALYST - R-PROGRAMMING
• Comments in R: Lines starting with ‘#’ are comments in R and are not executed as code.

• Assignment Operator: You can assign values using <- or = interchangeably. For example, x <- 10 or y =
5 both assign values to variables.
• Working Directory: Use getwd() to check the current working directory. You can change it using setwd().
• File Paths in R: R uses forward slashes (/) instead of backward slashes () for file paths (e.g., “C:/Users/
YourName/Documents”).
• Combining Values: The c() function is used to combine values into a vector.
• Code Formatting: In RStudio, press CTRL + SHIFT + A to format your code neatly.
• Handling Missing Values: R uses NA to represent missing values. To calculate sums excluding NA, use
na.rm = TRUE (by default, it is FALSE).
• Generating Sequences: The form 1:10 generates integers from 1 to 10.
• Case Sensitivity: R is case-sensitive, so use the correct case for variable and function names.
• Getting Help: To get help for a function, use the help() function (e.g., help(sum))
• Naming Conventions: Object names can include letters, numbers, underscores (_), or periods (.), but they
must start with a letter.
• Data Structures: R has various data structures like vectors, factors, data frames, matrices, arrays, and
lists. Data frames are similar to datasets in SAS.
• Importing CSV Files: Use read.csv() to import CSV files into R. For example: mydata <- read.csv(“c:/
mydatafile.csv”, header=TRUE).
• Opening Tables/Functions: Use the fix() function to open data frames or functions. For instance, fix-
(mydata) opens the ‘mydata’ data frame.
• Viewing Source Code: To view the source code of a function, use fix() with the function’s name (e.g.,
fix(colSums)).
• Retrieving Previous Commands: In R Console, you can retrieve previous commands with the UP arrow
key and edit them to rerun.
• Installing and Loading Packages: Install packages with install.packages(“package_name”) and load
them with library(“package_name”).
• Saving Data in CSV Format: Use write.csv() to save a data frame in CSV format. For instance, write.
csv(mydata, “file1.csv”).
• Saving Data in R Format: Save a data frame in R format using save.image(“mydata.RData”).
• Loading RData: To load an RData file, use load(“mydata.RData”).
• Using attach(): The attach() function is used to tell R which data set to use. For example, attach(mydata)
attaches the ‘mydata’ data frame.

Pg. No.13 2021-2024


BRAINALYST - R-PROGRAMMING

Data Types in R:
• In programming languages, we utilize variables to store and manage various data. Variables act as desig-
nated memory locations where program information is stored. When we create a variable within our code,
it prompts the allocation of memory space.
• R programming supports multiple data types, including integers, strings, and more. The allocation of
memory by the operating system depends on the specific data type associated with the variable, dictating
the kind of data that can be stored within the reserved memory.
• In R programming, we encounter a range of data types, including:

2021-2024 Pg. No.14


BRAINALYST - R-PROGRAMMING

Example:

Pg. No.15 2021-2024


BRAINALYST - R-PROGRAMMING

Output:

Data Structures in R Programming


• A solid grasp of data structures is crucial in the realm of R programming. Data structures represent the
objects we routinely manipulate in R, and proficiency in handling diverse data structures is pivotal for
effective data analysis. Within the R environment, nearly everything is treated as an object.
R provides a spectrum of data structures, encompassing:
Atomic Vectors:
• Atomic vectors stand as the fundamental building blocks of data structures in R.
• They manifest in six distinct flavors: logical, integer, character, double, complex, and raw.
• An atomic vector constitutes an amalgamation of elements, typically of the same data type.

Updated Illustration:
• c - numeric values (1, 2, 3, 4, 5)
• Text or character values are stored in character vectors, as explained.
• Updated Illustration: character values - c
• (Apple, Banana, Cherry)

Lists:
• Lists in R exhibit remarkable versatility by accommodating elements of divergent data types.
• Unlike atomic vectors, lists defy constraints by allowing mixed data types within.
• Lists are often referred to as generic vectors due to their capacity to house an assortment of R
objects.
• my_list <- list(Name = “John”, Age = 30, Married = TRUE)
Arrays:
• Arrays serve as repositories for data extending beyond two dimensions.
• These structures house data of uniform data types, and they feature contiguous memory allocation.
2021-2024 Pg. No.16
BRAINALYST - R-PROGRAMMING

• For instance, a 3-dimensional array can generate four 2x3 matrices.

Matrices:
• Matrices adopt a two-dimensional, rectilinear format for storing elements.
• Constituent elements within a matrix share a common atomic type, often numeric to facilitate
mathematical computations.
• Matrices are brought into existence through the utilization of the matrix() function.
• my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)

Data Frames:
• Data frames emulate the semblance of a two-dimensional array, akin to a tabular representation.
• Each column serves as a repository for a specific variable, while rows aggregate corresponding
sets of values.
• Data frames are instrumental in data analysis, recognized by their distinctive row names and
non-empty column names.

Factors:
• Factors are specialized data entities employed for classification and the storage of data as discrete
levels.
• They possess the capability to encompass both character strings and integers.
• Factors prove especially valuable when confronted with columns harboring a finite number of
unique values.
• The factor() function is harnessed for the creation of factors within R.

1. Vectors:
• One-dimensional data structures in R are called vectors. The same data type elements may be
contained in both.

Quantitative(numeric) Vector:
• An explanation of how numerical vectors, such as integers or decimals, hold numerical values .

Updated Illustration:
• c - numeric values (1, 2, 3, 4, 5)
• Text or character values are stored in character vectors, as explained.
• Updated Illustration: character values - c
• (Apple, Banana, Cherry)

Execution of the code


Where to write
• Console
• The codes can be executed in the console (we cannot save these codes)
• Codes are executed by pressing ENTER
Pg. No.17 2021-2024
BRAINALYST - R-PROGRAMMING

• R script
• It’s like page in the notebook
• Here we can give a name to the scripts, save it, reuse it and manipulate or edit the codes the
R scripts, we can shares shares the R scripts with others.

• The R script has the extension of .R


• Notes:
• Use the # symbol to add comments explaining your code.

• Step 1: Open RStudio


• Open RStudio to begin coding.

• Step 2: Create a New Script


• Click on “File” > “New File” > “R Script” to create a new script.

• Step 3: Write Your Code


• Write your R code in the script. For example, let’s calculate the sum of numbers
from 1 to 10:

Example:
• # Calculate the sum of numbers from 1 to 10
• sum_result <- sum(1:10)
• print(sum_result)

• Step 4: Run Code Line by Line


• Select a line of code and click on the “Run” button or press Ctrl + Enter to run
the selected line. For example, select the line sum_result <- sum(1:10) and run it.

• Step 5: View Results


• The output of the executed line will be displayed in the Console pane at the bot-
tom of RStudio. In this case, it will show the sum of numbers from 1 to 10.

• Step 6: Run the Whole Script


• After testing individual lines, you can run the entire script by clicking the “Run”
button at the top of the script editor or by pressing Ctrl + Shift + Enter.

• Step 7: Save Your Script.


• Save your script by clicking on “File” > “Save” and providing a name and loca-
tion for the script file.

• Step 8: Repeat and Modify


• Execution of the codes in R scripts:
• Here by pressing enter we do not execute the code, rather simply move to
the next line.

2021-2024 Pg. No.18


BRAINALYST - R-PROGRAMMING

• Note: Write code in R script because in console can’t save and edit the code.

How to Executed code:


• First be in the line of code that needs to be executed and press control enter.
• Or Use the run button.

For execution of multiple line of code


• Steps: We have to select the files that needs to be executed and then press control enter.
• Run button

Note: In programming language sequence of code matter.

• Example: if have 1000 lines of code then select code using shortcut.

• Steps 1: Write Your Code


• Write your R code in the script editor.

• Step 2: Execute Code Line by Line


• Position the cursor on the line of code you want to execute.
• Press Ctrl + Enter to execute the current line. The output will appear in the Console pane at the
bottom.

• Step 3: Execute Selected Code


• Select the lines of code you want to execute.
• Press Ctrl + Enter to execute the selected lines. The output will appear in the Console pane.

• Step 4: Execute Whole Script


• If you want to execute the entire script, click anywhere in the script editor.
• Press Ctrl + Shift + Enter. The entire script will be executed, line by line.

• Step 5: Comment and Uncomment Code


• To comment a single line: Place the cursor on the line and press Ctrl + Shift + C.
• To uncomment a line: Place the cursor on the commented line and press Ctrl + Shift + C again.

Variable in R programming
• Within the realm of R programming, variables act as containers, serving the purpose of storing and ref-
erencing data that requires processing. These variables are versatile and capable of holding various types
of information, encompassing atomic vectors, collections of atomic vectors, or even combinations of
different R objects.
• Unlike statically typed languages such as C++, R adopts dynamic typing. In essence, this dynamic typing
implies that R discerns the data type of a variable at the moment when the corresponding statement is
executed. As for the nomenclature of variables in R, it adheres to certain rules: a valid variable name can
incorporate letters, numbers, dots, and underscores. However, it’s imperative to underscore that a variable
name should initiate with a letter or a dot that isn’t succeeded by a number.
Pg. No.19 2021-2024
BRAINALYST - R-PROGRAMMING

Assigning Variables in R Programming


• In R programming, you have at your disposal three distinct operators for the task of assigning values to
variables. These operators include the leftward arrow, rightward arrow, and the equal-to operator.
• Furthermore, to display the value of a variable, R offers two useful functions, namely, print() and cat().
The print() function is designed to showcase the value of a variable, whereas the cat() function excels in
combining multiple values into a continuous output for printing purposes.

2021-2024 Pg. No.20


BRAINALYST - R-PROGRAMMING

• Output:

Arithmetic operator
• / (division)
• * (multiplication)
• - (subtraction)
• %% (modulus)(remainder)
• %/% (integer division)
• Example: 10+20
• 23*23
• 30%3
• 50-45
• 100//23
• 234%%34

Relational operators
• Whenever we use a relational operator, we compare two objects, and the output is boolen.
• Boolean is also known as logic.
• Boolean is always either TRUE or FALSE
• > (greater than)
• < (less then)
• <= (greater than equal to)
• >= (less than equal to)
• == (equal to equal to)
• The difference between = and ==
• a=10
• b=30 (assignment operator where assigning or saving value of b inside a)
• a==b (comparing is value inside a and b is same or not)
• a=b
• a

Logical operator
• And &
• Or |
• Not |
• Here also the output is Boolean, we will use then during the data manipulation function.
• Example: l1=10<5
• L2= TRUE
• Here also the output is Boolean. We will use then during the data manipulation function
Pg. No.21 2021-2024
BRAINALYST - R-PROGRAMMING

Package reference operator


• ::
• Used to access functions from external packages.
• Example: To use the mean () function from the stats package:
• R has libraries (package), and these packages contain functions.
• We use the package reference operator to call function from particular package
• Mylibrary :: function101() # this is the convention, but it is not mandatory.
• Function101() # we can also directly call the function.
• Reasons for using this package reference.
• Easier for the machine * to process the function as the chance of making a mistake by confusing
the function becomes less.
• It is easier for the other user to understand where the function is coming from
• Example: Mylibrary:: view () # using package ref. no
• View () # directly calling

Other operators
• There are other operators that might come across. That will update.

Various types of data types in various language:


• Integer, float, string, character, numeric, date, Boolean, int, largeint, smallint, varchar, char.

In R, we have multiple kinds of data types, but the major ones are as follows:
The data type related to number
1. Integer
2. Numeric
3. Short number with decimal
4. Long number with decimal
5. Short number without decimal
6. Long number without decimal
Integer
• It is found in special cases during the importing of certain files or when certain.
• Definition of small number is
• -2345667 to 2345566
• Only small number without decimal an have integer as the data type
• Example:
• N1=34
• N2=45
• N3=-87
• N4=32434
• Class(n1)
• Class(n2)
• Class(n3) -----find data type
Complex
• Real + imaginary
• For example, 5+6i

2021-2024 Pg. No.22


BRAINALYST - R-PROGRAMMING

Data type related to characters (text)


Character
• Any alphabet or alphanumeric or symbol the data type is character.
• Example: c1= “abs”
• C2= “abc123”
• C3= “123abd”
• C4= “3455”
Factor:
• Unique to R. will be explaining this in the upcoming classes.
• Date related to Boolean
• Logical
• This data type which is used to represent the Boolean i.e. True or False
Date and time
• Date and time generally are not a naturally occurring data type in many languages such as R and
python.
• These derived data type i.e. we have to manually convert the object into this data type, and we
have to manually assign it.
• POSIXct (portable as interface denoting dates in R)
• Posixct is the data type in which dates are stores in all the OS
• Ct is the conversion of posix to R.
• Try all the examples in RStudio.

Practice data types, operator examples


Note: Make sure to consult R documentation or other reliable resources for more advanced
usage and functions.

Packages in R
• Packages, also known as libraries, are a crucial component of the R programming environment.
They serve to expand the capabilities of base R by offering an extensive range of additional func-
tions, tools, and documentation. These packages are indispensable for a variety of tasks, encom-
passing data manipulation, data visualization, statistical analysis, and more. Essentially, packages
consist of pre-written code collections that enhance R’s utility, making it a potent language for data
analysis and computational tasks.
What Constitutes a Package?
• A package essentially comprises a collection of functions bundled together. These functions are
systematically organized within the package, often resembling a compressed archive or a regular
directory. In essence, you can conceptualize a package as a specialized toolbox tailored for specific
tasks. For instance, a package dedicated to financial data analysis might encompass functions asso-
ciated with account management, and it can conveniently be referred to as the “AccManipulation”
library.
Utilizing Packages in R
• To leverage the capabilities of packages in R, it involves several key steps:

Pg. No.23 2021-2024


BRAINALYST - R-PROGRAMMING

Package Installation:
• Before making use of a package, you must install it. This is accomplished through the use of
the install.packages(“package_name”) function. For instance, to install the widely used “ggplot2”
package for data visualization, you can execute the following:

Package Loading:
• Once a package is successfully installed, it needs to be loaded into your current R session. This is
achieved by employing the library(package_name) function. For example, to load the “ggplot2”
package, you would employ the following:

Employing Package Functions:


• Subsequent to loading a package, you gain access to its functions. These functions are meticulous-
ly crafted to simplify and streamline various tasks. As an illustration, the “ggplot2” package pro-
vides the ggplot() function, which proves exceptionally useful for creating intricate visualizations.
Here’s an instance of its application:

• Diverse Categories of Libraries in R


• Libraries in R can be broadly classified into two primary categories:

System Libraries:
• System libraries are integral components that come pre-installed with R by default. They are me-
ticulously developed and maintained by the core developers of R. When you opt to install R (with
the base option), you inherently acquire these libraries. System libraries encompass “utils,” which
provides essential R functionalities, and “stats,” which facilitates statistical operations.

User Libraries:
• In contrast, user libraries are third-party libraries formulated and maintained by the vibrant R
community. These libraries are typically accessible for download and utilization from the Compre-
hensive R Archive Network (CRAN) repository. Functions encompassed within user libraries may
be implemented in diverse programming languages, including R (the native language), C++, Perl,
Java, Julia, and more. These libraries furnish an extensive array of specialized tools and functions
meticulously tailored to cater to a myriad of diverse needs.

2021-2024 Pg. No.24


BRAINALYST - R-PROGRAMMING

Importing Data into R


• Data import is a fundamental step in any data analysis process. In R, you can efficiently import
data from various file formats such as CSV, Excel (XLSX), and text (TXT) files. Let’s explore
these methods in-depth:

CSV Files
• Method: To import data from a CSV file, use the read.csv () function.
• Syntax: read.csv (file, header = TRUE, sep = “,”, ...)
• Description: This function is specifically designed to read data from comma-separated values
(CSV) files. Here’s a breakdown of the parameters:
• file: Specifies the path to the CSV file.
• header: A logical value indicating whether the first row contains column names.
• sep: Defines the delimiter used in the CSV file (typically a comma).

• Example:

Excel Files (XLSX)


• Method: For importing data from Excel files (XLSX), the readxl package is handy.
• Syntax: read_xlsx(path, sheet = 1, range = NULL, col_names = TRUE, ...)
• Description: This function facilitates data import from Excel files. Here are the key arguments:
• path: Specifies the path to the Excel file.
• sheet: Indicates the sheet name or index from which to import data.
• range: Defines the range of cells to import (optional).
• col_names: A logical value that determines whether to read column names.

• Example:

Text Files (TXT)


• Method: To read data from text files (TXT), you can employ the read.table() function.
• Syntax: read.table(file, header = FALSE, sep = “”, ...)
• Description: This function is versatile and can read tabular data from text files. Key arguments
include:
• file: The path to the text file.
• header: A logical value indicating whether the first row contains column names.
• sep: Specifies the field separator used in the text file.

Pg. No.25 2021-2024


BRAINALYST - R-PROGRAMMING

• Example:

Exporting Data from R


• Data export is a crucial step in the data analysis process, allowing you to save your results and
share them with others. In R, you can efficiently export data to various file formats such as CSV,
Excel (XLSX), and text (TXT) files. Let’s explore these methods in-depth:
CSV Files
• Method: To export data to a CSV file, use the write.csv () function.
• Syntax: write.csv (data, file, row.names = FALSE)
• Description: This function is tailored for writing data frame data to a comma-separated values
(CSV) file. Here’s what the parameters mean:
• data: The data frame you want to export.
• file: Specifies the output file path.
• row.names: A logical value indicating whether row names should be included in the output file.

• Example:

Excel Files (XLSX)


• Method: For exporting data to Excel files (XLSX), you can utilize the openxlsx package.
• Syntax: write.xlsx (data, file, sheetName = “Sheet1”, ...)
• Description: This function is designed to write data frame data to an Excel file. Here’s what you
need to know about the arguments:

• Example:

Text Files (TXT)


• Method: To export data to text files (TXT), you can use the write.table() function.
• Syntax: write.table(data, file, sep = “\t”, row.names = FALSE)
• Description: This function is versatile and can write data frame data to a text file. Here’s what the
arguments mean:

2021-2024 Pg. No.26


BRAINALYST - R-PROGRAMMING

Reserved Keywords in R Programming


• In the realm of programming, keywords hold a special significance as they are reserved by a programming
language for specific purposes or commands. These keywords cannot be repurposed as variable names due
to their predefined roles within the language. They are often referred to as “reserved names.”
• Similar to languages like C, C++, and Java, R also possesses a set of reserved keywords. These keywords
have predefined meanings or functions within the R programming language. Attempting to use them as
variable names is not allowed.
• To access the complete list of reserved keywords in R, you can refer to the? reserved command or utilize
the help(reserved) command.

Pg. No.27 2021-2024


BRAINALYST - R-PROGRAMMING

If Statement:
• The if statement is used to conditionally execute one or more statements based on a Boolean ex-
pression in R programming.

• Example:
• a <- 12
• if (a < 16) {
• cat (“I am lesser than 15”)
• }
• Output:

Else statement:
• The else statement is used in conjunction with the if statement. It specifies the code block to be
executed when the if condition is false.

• Example:

• Output:

Repeat Keyword:
• The repeat keyword is used to create a loop that iterates indefinitely. To exit the loop, the break
statement is used.

• Example:

2021-2024 Pg. No.28


BRAINALYST - R-PROGRAMMING

• Output:

While Keyword:
• The while keyword is used to create a loop that executes a block of code as long as a specified
condition is true.

• Example:

• Output:

Function Keyword:
• The function keyword is used to define user-defined functions in R. Functions encapsulate a set of
statements that can be executed as a single unit.

• Example:

Pg. No.29 2021-2024


BRAINALYST - R-PROGRAMMING

• Output:

for Loop:
• For keyword is used for looping or iterating over a sequence (e.g., vector, list).

• Example:

• Output:

Next Keyword:
• The next keyword is used to skip the current iteration of a loop without terminating it, proceeding to the
next iteration.

2021-2024 Pg. No.30


BRAINALYST - R-PROGRAMMING

• Example:

• Output:

Break Keyword:
• The break keyword is used to terminate a loop prematurely if a specified condition is met.

• Example:

• Output:

Pg. No.31 2021-2024


BRAINALYST - R-PROGRAMMING

TRUE/FALSE:
• TRUE and FALSE are keywords used to represent Boolean true and false values in R, respectively.

NULL:
• NULL represents the null object in R and is used for missing or undefined values.

• Example:

• Output:

Inf and NaN:


• Inf and NaN represent positive and negative infinity and “Not a Number,” respectively, in R.

NA:
• NA is a logical constant representing missing values in R, and it can be used with different atomic
vector types.

2021-2024 Pg. No.32


BRAINALYST - R-PROGRAMMING

R Functions
• In the realm of R programming, functions serve as organized sets of statements designed to execute
specific tasks. While R boasts an array of pre-existing functions, users also have the liberty to craft their
own customized functions. Functions occupy a pivotal role in programming, fostering modularity and
reusability.
• Functions act as a shield against the twin perils of code repetition and complexity reduction. They effec-
tively dissect code into smaller, logically structured components. A well-crafted R function adheres to the
following principles:
• It is tailored to execute a precise task.
• It may accept arguments (input values) as needed.
• It encompasses a body where the task-specific code is housed.
• It may, optionally, yield one or more output values.
• In R, functions come to life through the function keyword. The syntax for defining an R function takes the
following shape:

• Functions stand as the cornerstone of R programming, facilitating code organization, reusability, and
manageability.

Components of Functions

Pg. No.33 2021-2024


BRAINALYST - R-PROGRAMMING

• In the context of R functions, there exist four foundational elements that define their structure and behav-
ior:
• Function Name: This denotes the specific name assigned to the function. In R, functions are treated as
objects and are stored with their corresponding names.
• Arguments: Arguments function as placeholders within a function. In R, functions can possess optional
arguments, signifying that they may or may not necessitate input values. Default values can also be as-
signed to these arguments. When invoking a function, values are supplied to these arguments.
• Function Body: The function body encompasses a collection of statements that delineate the actions and
operations the function undertakes. It elucidates the logic and tasks executed by the function.
• Return Value: The return value constitutes the ultimate expression within the function body, which under-
goes evaluation and is subsequently returned as the output of the function.

Types of Functions

• R functions are generally categorized into two principal types:


• Built-in Function: These functions are predefined within the R programming framework and are readily
accessible for use without necessitating users to define them. Users can employ these functions by simply
invoking them. R boasts an array of built-in functions, including but not limited to seq (), mean (), max
(), and sum(x).

2021-2024 Pg. No.34


BRAINALYST - R-PROGRAMMING

• Output:

• User-defined Function: R empowers users to craft their own customized functions to cater to specific
programming needs. Once formulated, these user-defined functions can be employed in a manner akin to
built-in functions.

Function Calls
• Functions can be called in various ways:
• With Arguments: Passing values to the function when calling it, supplying necessary
input.
• Without Arguments: Invoking a function without providing input values.
• With Argument Values: Supplying arguments in the same sequence as defined in the
function or using argument names.
• With Default Arguments: Functions can have default argument values, which are used if
no values are provided during the function call.

Built-in Functions in R
• Built-in functions, also referred to as predefined functions, are functions that come prepackaged with the
R programming language. R boasts an extensive repertoire of these functions, which cater to a wide range
of tasks and operations that users might want to perform. These built-in functions are conveniently cate-
gorized according to their specific functionalities.

Pg. No.35 2021-2024


BRAINALYST - R-PROGRAMMING

Mathematical Functions
• Within the realm of mathematical operations, R offers a diverse array of built-in functions. These func-
tions are tailored to assist with mathematical calculations and manipulations, enhancing the numerical
capabilities of the language.

2021-2024 Pg. No.36


BRAINALYST - R-PROGRAMMING

Character Functions
• Character functions in R serve as indispensable tools for managing and manipulating character data.
These functions empower users to handle and transform text-based information effectively.

Pg. No.37 2021-2024


BRAINALYST - R-PROGRAMMING

2021-2024 Pg. No.38


BRAINALYST - R-PROGRAMMING

Statistical Probability Functions


• R provides a suite of statistical probability functions to address probability-related com-
putations. These functions are essential for conducting statistical analyses and probability
assessments.

Pg. No.39 2021-2024


BRAINALYST - R-PROGRAMMING

Other Statistical Functions


• Apart from probability-related calculations, R encompasses various other statistical functions designed to
facilitate diverse statistical operations.

2021-2024 Pg. No.40


BRAINALYST - R-PROGRAMMING

R Data Visualization
• Creating compelling data visualizations in R is both accessible and powerful, thanks to R’s inherent ca-
pabilities. Data visualization is a valuable method for extracting meaningful insights from data by repre-
senting it visually. Visualizations bring to light complex data patterns that might otherwise go unnoticed.
• Utilizing data visualization techniques makes it easier to explore and comprehend large datasets, simpli-
fying the process of uncovering essential insights.

R Visualization Packages
• R provides a wealth of packages designed specifically for data visualization tasks. These packages include
a wide range of tools and libraries:

Pg. No.41 2021-2024


BRAINALYST - R-PROGRAMMING

1) Plotly
• Description: The plotly package offers a platform for creating high-quality, interactive online graphs.
It builds upon the JavaScript library “plotly.js” to provide users with visually appealing and interac-
tive charting capabilities.
• Syntax: To create a plotly chart, you can use the plot_ly() function. For example:

• When to Use: Use plotly when you need to generate web-based interactive graphs with features like
zooming, hovering tooltips, and more.

2) ggplot2
• Description: ggplot2 is a widely used R package for creating elegant and high-quality declarative
graphics. It stands out for its ability to generate aesthetically pleasing and customizable graphs.
• Syntax: To create a ggplot2 chart, you use the ggplot() function and add layers with the + operator.
For example:

• When to Use: ggplot2 is ideal for creating static, publication-quality plots and customizing
visuals to suit specific requirements.

2021-2024 Pg. No.42


BRAINALYST - R-PROGRAMMING

3) Tidyquant
• Description: tidyquant is a financial package within the tidyverse ecosystem. It’s designed for im-
porting, analyzing, and visualizing financial data, making it a valuable tool for quantitative financial
analysis.
• Syntax: tidyquant offers a range of functions for financial data analysis and visualization, often used
in combination with other tidyverse packages like dplyr and ggplot2.
• When to Use: Use tidyquant when working with financial data, performing stock market anal-
ysis, or creating financial visualizations.

4) taucharts:
• Description: taucharts is a library that focuses on data visualization, allowing users to map data fields
to visual properties declaratively. It provides a flexible platform for exploring data visually.
• Syntax: You can create taucharts by specifying data mappings and visual properties using R func-
tions.
• When to Use: taucharts is suitable for exploratory data visualization and when you want to
create custom, interactive charts.

5) ggiraph
• Description: ggiraph is a tool for creating dynamic ggplot2 graphs. It enables the addition of tooltips,
JavaScript actions, and animations to ggplot2 graphics, enhancing their interactivity.
• Syntax: You start with a ggplot2 object and use ggiraph functions to make it interactive. For instance,
you can add tooltips using the geom_tooltip() function.
• When to Use: ggiraph is great when you need to make your ggplot2 graphs interactive and
informative.

6) geofacets
• Description: The geofacets package extends ‘ggplot2’ to provide geofaceting functionality. Geofacet-
ing allows arranging plots for different geographical entities into a grid, preserving geographical
orientation.
• Syntax: You use ggplot2 syntax with additional geofacets functions like geom_map() to create geo-
faceted plots.
• When to Use: geofacets are valuable when you want to create grid-based geographic visualiza-
tions with ggplot2.

7) googleVis
• Description: googleVis facilitates creating interactive web pages with Google’s chart tools based on
R data frames. It acts as a bridge between R and Google’s charting capabilities.
• Syntax: You can create various Google charts using functions like gvisPieChart() or gvisLineChart().
• When to Use: googleVis is suitable for generating web-based interactive charts, especially
when integrating R with web applications.

Pg. No.43 2021-2024


BRAINALYST - R-PROGRAMMING

8) RColorBrewer
• Description: RColorBrewer provides a collection of color schemes designed by Cynthia Brewer. It’s
particularly useful for selecting color palettes for maps and other visualizations.
• Syntax: You can set color palettes using functions like brewer.pal() and apply them to your plots.
• When to Use: RColorBrewer is handy when you want aesthetically pleasing and distinguish-
able color schemes for your visualizations.

9) dygraphs
• Description: dygraphs is an R interface to the dygraphs JavaScript charting library. It specializes in
charting time-series data with interactive features.
• Syntax: You use the dygraph() function to create interactive time-series plots.
• When to Use: dygraphs are ideal for visualizing and exploring time-series data with interactive fea-
tures.

10) shiny
• Description: R’s shiny package enables the development of interactive web applications with ease.
It offers extensions, HTML widgets, CSS, and JavaScript to create aesthetically pleasing web apps.
• Syntax: shiny uses a reactive framework, and you define the app’s components within a shiny app
object.
• When to Use: shiny is the go-to choose for building interactive web applications that utilize R’s data
visualization capabilities.

R Graphics
• Graphics serve a crucial role in highlighting key aspects of data. They enable us to explore the distribution
of individual variables, uncover connections between different variables, and provide concise summaries,
particularly when dealing with extensive datasets. Graphics serve as a valuable addition to various statis-
tical and computational methods, enhancing our ability to analyze data effectively.

2021-2024 Pg. No.44


BRAINALYST - R-PROGRAMMING

Standard Graphics
• Standard graphics in R are made accessible through the graphics package, and they encompass a variety
of functions designed for generating statistical plots. These plots include commonly used visualizations
such as scatterplots, pie charts, boxplots, barplots, and more. What makes them convenient is that you can
create these graphs with just a single function call.

Graphics Devices
• A graphics device is essentially the canvas where your plots or visuals appear. It can be your computer
screen (a screen device), a PDF file (a file device), a Scalable Vector Graphics (SVG) file (another file
device), or even a PNG or JPEG image file (yes, also a file device).
• Here are some important points to grasp about graphics devices:
• The functions related to graphics devices produce output specific to the active graphics device.
• The default and most commonly used device is the screen device, where you see your plots.
• R offers various graphics devices like the PDF device, the JPEG device, etc., each suited for different
output formats.
• You simply need to specify the graphics output device you want to use, and R takes care of producing the
correct output for that device.
• You can have multiple devices open at once, but only one can be active at a time.

Basics of the Grammar of Graphics


• The grammar of graphics consists of fundamental elements that make up a statistical graphic. Let’s
break down these elements:
• Data: Data is the raw information processed to create a graphic.
• Aesthetic Mappings: These mappings connect variables in your data to visual properties. For in-
stance, mapping temperature data to the X-axis in a scatter plot.
• Geometric Objects: Geometric objects represent individual data points on your plot, using the aes-
thetic mappings to position them.
• Statistical Transformations: These transformations calculate statistical analyses of your data, like
adding a regression line or counting occurrences of specific values.
• Scales: Scales are used to map data values onto the coordinate system of the graphics device, deter-
mining how data is represented.
• Coordinate System: The coordinate system defines how data points are placed and visualized,
whether Cartesian or other custom systems.
• Faceting: Faceting involves splitting data into subgroups and generating sub-graphs for each sub-
group.

Advantages of Data Visualization in R


• Enhanced Understanding: Graphics and charts are often more engaging and easier to comprehend
than plain text and numbers. They cater to a wider audience and facilitate better decision-making.
• Efficiency: Data visualization can present a large amount of information compactly. This is particu-
larly useful in complex decision-making scenarios where multiple factors are involved.
• Location Insights: Geographic maps and GIS features in data visualization can be valuable when
location plays a crucial role in your analysis.
• Disadvantages of Data Visualization in R

Pg. No.45 2021-2024


BRAINALYST - R-PROGRAMMING

• Cost: Developing data visualization applications can be expensive, especially for small businesses.
Employing professionals to create complex visuals may increase costs.
• Distraction: Overly complex or flashy visuals can sometimes detract from the actual insights. It’s
important to maintain a balance between aesthetics and functionality.

R Pie Charts
• In R, you can create pie charts to visualize data using the pie () function. Pie charts represent data as slices
of a circle, each labeled and colored differently. Although pie charts are not the preferred choice in R due
to their limitations, you can still create them for basic visualization. Here’s how to do it:

• Syntax for pie () function:

• X: A vector containing numeric values for the pie chart.


• Labels: Descriptions for each slice.
• Radius: Sets the radius of the pie chart.
• Main: Adds a title to the chart.
• Col: Specifies the color palette for the chart.
• Clockwise: A logical value indicating the direction of slice drawing.

2021-2024 Pg. No.46


BRAINALYST - R-PROGRAMMING

Adding Title and Custom Colors:


• You can improve your pie chart by adding a title and using custom colors. The main parameter adds a title,
while the col parameter defines the colors.

• Example:

Displaying Percentage and Adding Legends:


• You can display data as percentages and add legends using the legend () function. This function allows you
to specify legend text, fill colors, and other attributes.

Pg. No.47 2021-2024


BRAINALYST - R-PROGRAMMING

Creating a 3D Pie Chart:


• You can also create three-dimensional (3D) pie charts using the pie3D () function from the plotrix pack-
age. The parameters are similar to those of the pie () function.

2021-2024 Pg. No.48


BRAINALYST - R-PROGRAMMING

Bar Charts in R
• A bar chart is a graphical representation that uses bars or rectangles of equal width to display the numerical
values of categorical data. Each bar’s length or height is proportional to the value it represents. Bar charts
are a useful way to summarize and visualize categorical data in R.
• To create bar charts in R, we can use the barplot() function, which has the following syntax:

• Here’s a breakdown of the parameters:


• Height: This represents the numerical values that you want to visualize in the bar chart. These values
determine the height of the bars.

Pg. No.49 2021-2024


BRAINALYST - R-PROGRAMMING

• Names.arg: You can provide labels for each bar on the x-axis using this parameter. It corresponds to the
categories or variables you are summarizing.
• Main: This parameter allows you to add a title or main heading to your bar chart.
• Col: You can specify the color of the bars using this parameter.

• Example:

2021-2024 Pg. No.50


BRAINALYST - R-PROGRAMMING

Boxplots in R
• Boxplots are a valuable tool for understanding the distribution of data within a dataset. divide the dataset
into three quartiles, allowing you to visualize key statistics such as the minimum, maximum, average, first
quartile, and third quartile. boxplots are particularly useful for comparing the distribution of data across
multiple datasets.
• You can create boxplots using the boxplot () function which follows this syntax:

Breakdown of the parameters:


• X: Represents the data or dataset for which you want to create a boxplot. you provide the numer-
ical values to be analyzed.
• Data: Your data is stored in a dataframe or list you can specify the dataset using this parameter.
• Notch: Create notched boxplots by setting this parameter to True. notched boxplots provide a vi-
sual indication of the median’s confidence interval.
• Varwidth: Adjust the width of the boxes based on the sample sizes, set this parameter to True.
• Names: Can label the boxplots on the x-axis using this parameter, providing names for each box-
plot.
• Main: Add a title or main heading to your boxplot using this parameter.

R histograms
• Histogram is a graphical representation that displays the frequency distribution of numerical values within
a dataset. It’s particularly useful for understanding how data is distributed across various value ranges. In
a histogram the data is divided into intervals or bins, and each bar represents the number of values falling
within a specific range.
• Unlike bar charts which are used to compare different categories or entities, histograms focus on visualiz-
ing the distribution of a single data set.
• You can create histograms using the hist () function, which takes a vector as input and offers several pa-
rameters for customization. List of syntax of the hist () function:

Break down these parameters:


• V: This represents the vector or dataset for which you want to create a histogram. You provide
the numerical values you want to analyze.
• Main: You can add a title or main heading to your histogram using this parameter.
• Xlab and Ylab: These parameters allow you to label the x-axis and y-axis, respectively.
• Xlim and Ylim: You can set limits for the x-axis and y-axis using these parameters to control the
visible range in your histogram.
• Breaks: Adjust the number of bins or intervals in your histogram by specifying the number of
breaks.

Pg. No.51 2021-2024


BRAINALYST - R-PROGRAMMING

• Cool and border: Customize the color of the bars and their borders with these parameters.

R Line Graphs
• A line graph, also known as a line chart, is a visual representation of data that changes continuously over
time. In a line graph, data points are connected with lines to illustrate the continuous progression of values.
These lines can move both upward and downward, reflecting changes in the data.

2021-2024 Pg. No.52


BRAINALYST - R-PROGRAMMING

• Line graphs are valuable for comparing different events, situations, or trends over time. They are particu-
larly effective at showing patterns and trends in data.
• To create a line graph in R, you can use the plot () function, which has the following syntax:

Let’s break down the parameters:


• V: This represents the data vector or dataset that you want to plot as a line graph. It contains numerical
values that change over time.
• Type: Specify the type of line graph you want, such as “l” for lines or “b” for both points and lines.
• Col: Customize the color of the lines in your graph.
• Xlab and Ylab: These parameters allow you to label the x-axis and y-axis, respectively, providing context
for your graph.a

Pg. No.53 2021-2024


BRAINALYST - R-PROGRAMMING

R Scatterplots
• Scatterplots are invaluable for comparing variables and understanding the relationships between them.
They provide a visual representation of how one variable is influenced by another. In a scatterplot, data is
presented as a collection of individual points, with each point denoting the values of two variables. One
variable is typically plotted on the vertical axis, while the other is plotted on the horizontal axis.
• In R, you have two primary methods for creating scatterplots: using the plot () function or utilizing func-
tions from the ggplot2 package.
• Here is the basic syntax for creating a scatterplot in R:

Let’s break down these parameters:


• X and Y: These represent the two variables you want to compare in your scatterplot. x corresponds to the
values on the horizontal axis, while y corresponds to the values on the vertical axis.
• Main: You can specify a title or main heading for your scatterplot.
• Xlab and Ylab: These parameters allow you to label the x-axis and y-axis, respectively, providing context
for your plot.
• Xlim and Ylim: These parameters define the limits of the x-axis and y-axis, which can be useful for fo-
cusing on a specific range of values.
• Axes: If set to FALSE, this parameter can be used to suppress the plotting of axes, which can be helpful
in certain situations.

2021-2024 Pg. No.54


BRAINALYST - R-PROGRAMMING

Data Exploration with R


• Data exploration is a crucial step in the data analysis process, helping you understand the structure and
characteristics of your dataset before diving into more advanced analyses or building predictive models.
In this guide, we’ll explore various data exploration techniques using R.

Importing Data:
• First, let’s import our dataset into R. We can use the read.csv () function to import a CSV file,
specifying that the header is included in the data:

• Alternatively, we can create sample data for demonstration purposes:

Pg. No.55 2021-2024


BRAINALYST - R-PROGRAMMING

1. Calculate Basic Descriptive Statistics:


• To get an overview of your data, you can calculate basic descriptive statistics for the entire dataset:

• You can also calculate the summary statistics for a specific column, either by index or by name:

2. List Variable Names:


• To list the names of the variables (columns) in your dataset, you can use the names () function:

3. Calculate the Number of Rows and Columns:


• You can determine the number of rows and columns in your dataset using the nrow() and ncol()
functions, respectively:

4. Display Dataset Structure:


• To examine the structure of your dataset, including data types, you can use the str () function:

• This function provides information about the data frame’s structure, including the number of ob-
servations (rows) and variables (columns).

5. Viewing Rows:
• You can view the first few rows of your dataset using the head () function:

• You can specify the number of rows to display by providing an argument to head (). For example,
to display the first 5 rows:

2021-2024 Pg. No.56


BRAINALYST - R-PROGRAMMING

6. Viewing Rows (Excluding Last Row):


• To view all rows except the last one, you can use head () with a negative value:

7. Viewing Last Rows:


• Similarly, you can view the last few rows of your dataset using the tail () function:

• You can specify the number of rows to display with tail () as well:

8. Viewing Rows (Excluding First Row):


• To view all rows except the first one, use tail () with a negative value:

9. Selecting Random Rows:


• You can select random rows from your dataset using the sample_n() function from the dplyr pack-
age. Make sure to install and load the dplyr package if it’s not already installed:

10. Selecting a Percentage of Random Rows:


• To select a specific percentage of random rows from your dataset, you can use sample_frac() from
the dplyr package. For instance, to select 10% of random rows:

Pg. No.57 2021-2024


BRAINALYST - R-PROGRAMMING

11. Counting Missing Values:


• To count the number of missing values in each variable of your dataset, you can use the colSums()
function with is.na ():

• Alternatively, you can achieve the same result using sapply():

12. Counting Missing Values in a Single Variable:


• If you want to count missing values in a specific variable (column), you can do so using sum ()
and is.na ():

Data Manipulation with R


• In this beginner-friendly tutorial, we will explore essential data manipulation tasks in R, providing practi-
cal examples and code snippets for various operations.

1. Replacing/Recoding Values
• Replacing Values: You can update values within a dataset. For example, let’s replace all 1s with 6s
in the “Q1” variable:

• Recoding a Range: You can recode a range of values. For instance, transform 1 through 4 to 0 and
5 through 6 to 1:

2. Renaming Variables
• To rename variables in R, utilize the rename () function from the “dplyr” package:

3. Keeping and Dropping Variables


• Keeping Variables: To retain specific variables, select them using column indices:

2021-2024 Pg. No.58


BRAINALYST - R-PROGRAMMING

• Dropping Variables: Exclude specific variables by their indices:

4. Subset Data (Selecting Observations)


• Selecting Rows by Condition: Subset data based on conditions. For example, select rows where
“age” equals 3:

• Selecting Rows with Missing Values: Choose rows containing missing values using the is.na ()
function:

5. Sorting
• Sort vectors or data frames with the sort () function:
• Sorting a Vector: Sort a vector, e.g., in descending order:

• Sorting a Data Frame: Sort a data frame by a specific column, e.g., sort “Gender” in ascending
order and “SAT” in descending order:

6. Value Labeling
• Use the factor () function for nominal data and ordered () for ordinal data to label values.

7. Dealing with Missing Data


• Identify and handle missing values in R:
• Number of Missing Values: Count missing values in a variable or row:

Pg. No.59 2021-2024


BRAINALYST - R-PROGRAMMING

• Removing Rows with Missing Values: Create a new dataset without missing data:
8. Aggregate by Groups
• Aggregate data by grouping variables. Calculate, for example, the mean of variable “x” by
grouped variable “y”:

9. Frequency for vector


• Calculate the frequency of unique values in a vector using the table () function.

10. Merging (matching)


• Merge datasets based on common variables. For instance:

11. removing duplicates


• Remove duplicate rows in a data frame:

12. combining columns and rows


• Combine columns with the cbind() function and rows with the rbind() function.

13. Combining rows with different sets of columns


• Use the smartbind() function from the “gtools” package to combine rows when column names
do not match. Install the package if necessary:

2021-2024 Pg. No.60


BRAINALYST - R-PROGRAMMING

Filtering data in R: 10 useful methods


• Filtering data in R is a fundamental data analysis task. It’s like using the ‘WHERE’ clause in SQL or the
‘filter function in excel to extract specific data based on certain criteria. we will explore various methods
for filtering data in R including examples for each.
Real-world filtering examples
• Before we dive into the methods let’s consider some practical scenarios where data filtering
can be valuable:
• Select customers who are currently active and open their accounts after January 1st, 2023.
• Retrieve details of customers who made more than 6 transactions in the past six months.
• Fetch information about employees who have spent over 3 years in the organization and re-
ceived the highest rating in the last 2 years.
• Analyze complaint data to identify customers who submitted more than 5 complaints in the
last year.
• Extract information about cities where the per capita income exceeds $50,000.
• We will use a dataset containing details of flights departing from NYC in 2013, with 336,776
rows and 17 columns, for our examples.
Importing Data

1. Using dplyr
• Filtering by a single condition
• Let’s say we want to select flight details for JetBlue airways originating from JFK airport.

2. Filtering based on multiple conditions


AND condition:
• Select flight details for JetBlue Airways (carrier code B6) with JFK origin.

OR condition:
• Select flight details for JetBlue Airways (carrier code B6) OR flights with JFK origin.

Pg. No.61 2021-2024


BRAINALYST - R-PROGRAMMING

Multiple OR Conditions:
• Select flight details for JetBlue Airways with carrier codes B6, US, or AA.

NOT EQUAL TO Condition:


• Remove flights with JFK origin.

NOT IN Condition:
• Remove flights with carrier codes B6, US, or AA.

3. Filtering by Row and Column Position


• You can select specific rows and columns by their position. For example, select rows 2 to 5
and the first two columns.

2021-2024 Pg. No.62


BRAINALYST - R-PROGRAMMING

4. Filtering by Row Position and Column Names


• Select rows 3 to 7 and include columns “origin” and “dest.”

5. Select Non-Missing Data


• To select rows where the “origin” column is not missing:

• Choosing the Right Approach for Filtering


• Use dplyr: It offers intuitive syntax and a wide range of data manipulation functions, making
it suitable for interactive analysis and code readability.
• Use data.table: Opt for this when working with large datasets, optimizing for speed and
memory usage, especially for performance-critical tasks.
• Use Base R: Stick to Base R for simpler filtering tasks or when avoiding external package
dependencies.

Linear Regression
• Linear regression is a statistical method used to make predictions based on the relationship between a
response variable (often denoted as “y”) and one or more predictor variables (usually denoted as “x”).

Pg. No.63 2021-2024


BRAINALYST - R-PROGRAMMING

Essentially, it helps us establish a linear connection between predictor and response variables.
• Mathematically, this relationship is expressed by the equation:
• y=ax+b
• Let’s break down what each part represents:
• y: The variable we want to predict, also known as the response variable.
• x: The variable(s) we use to make predictions, often referred to as predictor variables.
• a and b: These are constants known as coefficients.

Steps for Performing Linear Regression:


• Data Collection: Gather a dataset with observed values for both predictor and response variables.
• Model Building: Create a model that defines the relationship using a statistical method like the lm() func-
tion in R.
• Coefficient Calculation: Find the values of coefficients (a and b) with the help of the model, forming a
mathematical equation.
• Residual Analysis: Examine the model’s summary to understand the average prediction error, known as
residuals.
• Prediction: Utilize the model, often with the predict () function, to predict response variable values for
new data points.
• Syntax of the lm() function in R:

Let’s understand the parameters:


• Formula: A symbolic representation of the relationship between predictor(s) and the response variable.
• Data: The dataset to which the formula will be applied.

Creating a Relationship Model and Obtaining Coefficients:


• First, we create input vectors for the lm() function, incorporating our predictor and response variables. We
then store the resulting model in a variable, often named relationship_model.

• Output:

2021-2024 Pg. No.64


BRAINALYST - R-PROGRAMMING

Getting a Summary of the Relationship Model:


• To assess the model’s quality, we use the summary () function:

• This summary provides insights into coefficients, residuals, and model goodness of fit.

Using the predict () Function:


• The predict () function enables us to forecast the response variable for new data points. It
takes the model (object) and new data (newdata) as inputs:

Plotting the Regression:


• We can visualize the regression line using the plot () function. In this example, we create a scatter plot and
overlay the regression line:

Pg. No.65 2021-2024


BRAINALYST - R-PROGRAMMING

• Output:

2021-2024 Pg. No.66

You might also like