0% found this document useful (0 votes)
97 views53 pages

Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats

This document discusses R, an open-source programming language for statistical analysis and graphics. It provides an overview of R's history and evolution from S, describes R's interface and principles as an object-oriented programming language, and discusses advantages such as being free, interfacing with other languages, and extensive visualization capabilities. Some drawbacks mentioned are a limited graphical user interface. Overall, the document promotes R as a powerful yet accessible tool for statistics and data analysis used widely in academia and research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views53 pages

Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats

This document discusses R, an open-source programming language for statistical analysis and graphics. It provides an overview of R's history and evolution from S, describes R's interface and principles as an object-oriented programming language, and discusses advantages such as being free, interfacing with other languages, and extensive visualization capabilities. Some drawbacks mentioned are a limited graphical user interface. Overall, the document promotes R as a powerful yet accessible tool for statistics and data analysis used widely in academia and research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

INTRODUCTION TO R

Shanti.S.Chauhan,Ph.D
Business Studies
SHUATS
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
HISTORY AND EVOLUTION OF R
Origin in the Bell Labs in the 1970’s
HISTORY AND EVOLUTION OF R
R has developed from the S language

S Version 1

S Version 2

S Version 3

S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R
1990’s: R developed concurrently
with S
1993: R made public

Acceleration of R development
 R-Help and R-Devl mailing-lists
 Creation of the R Core Group

Source: R Journal Vol 1/2


HISTORY AND EVOLUTION OF R
Growing number of packages

2001: ~100 packages

2009: Over 2000 packages

2000: R version 1.0.1


Today: R version 2.14

Source: R Journal Vol 1/2


HISTORY AND EVOLUTION OF R
Explosion of R popularity in the last decade

 Object-oriented, growing user base, scripting features

 Free and open-source

 Irrational reasons: R seen as « cool »


HISTORY AND EVOLUTION OF R
Comparison of Mailing Lists

Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
HISTORY AND EVOLUTION OF R
Popularity amongst programming languages

KD Nuggets 2012 survey


HISTORY AND EVOLUTION OF R
Number of Blogs

Software Number of Blogs


R 365
SAS 40
Stata 8
Others 0-3

Data as on Mar 2012


AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
PRINCIPLE AND SOFTWARE PARADIGM
R is not really a (statistical) software

 R is rather a programming language


 Limited user-friendly interfaces for data analysis
 Is object oriented and almost non declarative
 Similar to programming languages like Fortran, C, Java, Python
PRINCIPLE AND SOFTWARE PARADIGM
R has limited Graphical User Interface (GUI) options
Recent endeavours to enhance R user-friendliness
Several GUIs in development
R-commander
RKWard
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
R Commander (RCmdr)
PRINCIPLE AND SOFTWARE PARADIGM
RKWard
PRINCIPLE AND SOFTWARE PARADIGM
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets

VS.
PRINCIPLE AND SOFTWARE PARADIGM
Sophisticated but costly SAS

VS.

Screenshot of SAS enteprise Miner


7.1. Source: sas.com
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DESCRIPTION OF R INTERFACE
R console

RGui: R basic
interface

R desktop
shortcut R command
line (space to
write
instructions)
DESCRIPTION OF R INTERFACE
Using the command line in R console
First false sentence
followed by R’s
error message

Second correct
sentence

Declaration and
printing of the
sentence as a R
object

Simple math
computations

Basic information
about the R object
containing the
sentence
DESCRIPTION OF R INTERFACE
RGui menu: File tab

File tab: Usual basic


and general
operations
DESCRIPTION OF R INTERFACE
RGui menu: Edit tab
Data editor:
entering the
Edit tab: basic object’s name
and general
editing

Results of the
data editor
DESCRIPTION OF R INTERFACE
RGui menu: View tab

View tab: viewing


Toolbar and/or
Status bar
DESCRIPTION OF R INTERFACE
RGui menu: Misc tab

Misc tab:
diverse
operations
DESCRIPTION OF R INTERFACE
RGui menu: Packages tabs

Packages tab:
adding functions
to R foundation
DESCRIPTION OF R INTERFACE
RGui menu: Windows tab

Windows tab:
usual options
to arrange the
tiles
DESCRIPTION OF R INTERFACE
RGui menu: Help tab
Help tab: very
important links
to help
Arithmetic Operators in R
Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponent
%% Modulus(Remainder for
division)
%/% Integer Division
Relational Operators
Operator Description

< Less than

> Greater Than

<= Less or equal

>= Greater than or equal

== Equal to

!= Not equal
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
ADVANTAGES OF R
R “philosophy”
 Open source code
 You can access the code of the software
 In-depth understanding of what R does
 Modify the code

Example “mgcv”
package webpage
Adress of the
« mgcv » package

Link with Package


sources (.tar.gz
file)

Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN


ADVANTAGES OF R
R access to source code
Example of source code of the “mgcv” package
Unzipping List of directories List of functions (i.e
mgcv_1.7-13.tar.gz in the « mgcv » open code) in the « src »
file (with 7zip) package (i.e code sources)
directory the « mgcv »
1 2 3 package

Screenshot of unzipping the « mgcv » package and browsing through the package’s files.
ADVANTAGES OF R
R is free

Software Academics Demo Commercial Commercial


(basic) (full)

R Free Free Free Free

SAS Free to $100s Not available $1 000s $10 000s

Statistica $100s 30 days limit ~$1 000 $10 000

Excel Free to $10s Limited ~$100 $100s


(Microsoft)
SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
ADVANTAGES OF R
Interface with other languages and scripting capabilities
Interfaces with virtually any other programming language
 Fortran, C, C++, Python…
 Tailor or rewrite your old codes in R
R as a scripting language
 R scripts can launch or be launched by other languages

« mgcv.c » file
in the
« mgcv »
package
coded in
typical C
programming
language

Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad


ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R role in academia
 R ~ tool used by the finest researchers
 Top-notch analytics capabilities

Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk


ADVANTAGES OF R
To summarize

Free open source philosophy


 R websites with many examples
 Free books
 Free online open courses
 Twitter accounts

Online help and discussion


 Mailing-lists
 Very active and diverse forums
 Communities of developers and helpers
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DRAWBACKS OF R
Average memory performance
Poor management of large datasets
 Avoid imbricated loops
 Prefer R advanced language for data structure

Complicated structure of packages in R


 Dozen of packages
 To be loaded every time in memory

R packages to better manage memory


 Rhadoop (inspiration from Google)
 Ff
 bigmemory
DRAWBACKS OF R
Average computing performance
No default parallel execution

 R packages to use several cores

 Top skills needed for high performance computing

A high-level programming language


 Abstract and modern (Python…)
 More productive coding
 But further from « machine language »…
 … meaning 100 times slower than C
DRAWBACKS OF R
Difficult data visualization and management
Difficult to inspect data sets

Screenshot of the R data editor and « Viewtable » tab in SAS 9.3


DRAWBACKS OF R
Difficult architecture management
Problems for large organizations
 R made of several thousands independent packages
 No deployment plan for complex organizations
 No installation support

Lack of code accountability


 Thousands of individual independent R developers
 Nobody responsible for the quality of the code

Potentially high hidden costs with R

 Total cost may favour commercial solutions for complex computations made in large

corporations
DRAWBACKS OF R
Relatively difficult to learn
Steep learning curve
 R code far from undergrad computer science courses
 Very complex data structures (useful if mastered)
 Is R’s syntax not logical?

Still, not more difficult to learn than SAS


 Both SAS and R more abstract than basic programming languages (Fortran, C…)
 Difficult to learn = more rewarding professionally!!
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
SO WHY LEARN R?
More positive than negative points
No language is perfect!!
 Contradictory objectives to meet
 Strengths and weaknesses of each language

Effect of legacy and the culture of the organization


 Use existing solutions (system architecture, BA tools…)
 Habits in business analytics

Different needs imply different tools


 Large corporations + defined procedures  SAS-like
 Less financial resources + quick proof of concept  R
SO WHY LEARN R?
Very appealing solution

Overall Corporate Consultants Academics NGO/Gov't


R
SAS
IBMSPSS
STATISTICA
Owncode
Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
REFERENCES FOR LEARNING R
Books
Many books available: choose the one that fits you!
 Style, pedagogy, theory vs practice
 Browse several books at local library or store

Springer’s UseR! Series (http://www.springer.com/series/6991)


 Recent, concise, good quality, affordable, diverse

Pure rookies: « A beginners’ guide to R », « R by example»

One step forward: « Business analytics for managers »


Intensive Excel users: « R through Excel»

O’Reilly R series (for programmers)


« R cookbook », « R in a nuttshell »
REFERENCES FOR LEARNING R
Websites
R official websites
 The R project for statistical computing (www.r-project.org )
 Mailing lists (« R-help », Special Interest Groups) and R journal
 Official (austere) manuals (« An introduction to R »)

Other websites
 UCLA online R resources http://www.ats.ucla.edu/stat/r/)
 R blogs aggregator (www.r-bloggers.com)
 Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts
(@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
REFERENCES FOR LEARNING R
Conferences
Growing number of conferences about R
Official International R UseR! conference

 Annual during a few days in new venue (Google it!)


 Lots of materials about many topics

Other conferences or venues


 Conferences about business analytics (data mining, specialized topics…) with sessions
involving R
 Find (or even start!) a R user group close to your location (R Wiki geographical list, map of
groups on « meetup.com »)
 Events and news from R-bloggers blog

You might also like