0% found this document useful (0 votes)
10 views185 pages

RProgrammingLanguage-Workshop (1) 240521 145940

Workshop on R Programming Language

Uploaded by

rhadianti69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views185 pages

RProgrammingLanguage-Workshop (1) 240521 145940

Workshop on R Programming Language

Uploaded by

rhadianti69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/366040774

R Programming Language for Data Analytics

Presentation · December 2022


DOI: 10.13140/RG.2.2.15044.09607/1

CITATIONS READS

2 2,617

1 author:

Ahmed Elshahhat
Zagazig University
75 PUBLICATIONS 418 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ahmed Elshahhat on 06 December 2022.

The user has requested enhancement of the downloaded file.


Cairo University
Faculty of Graduate Studies for Statistical Research
The 55th Annual International Conference on Data Science

Workshop

R Programming Language for Data Analytics

By

Dr. Ahmed Elshahhat


PhD in Statistics, Cairo University, Giza, Egypt
Lecturer of Statistics, Information Systems Dep.,
Faculty of Technology and Development,
Zagazig University

December 2022

DOI: 10.13140/RG.2.2.15044.09607
Overview
Data Structures
R Statistics
R Graphics
Inference

To the late Prof. Samir K. Ashour

Professor of Mathematical Statistics,


Cairo University, Giza, Egypt

1943-2022

1 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Dr. Ahmed Elshahhat is currently working as lecturer


of Statistics in the Department of Information Systems,
Faculty of Technology and Development, Zagazig Uni-
versity, Zagazig, Egypt. He did his MSc in Mathemat-
ical Statistics in the year 2016 from Cairo University,
Giza, Egypt, and PhD in Applied Statistics and Econo-
metrics in the year 2019 from the same university. He
has contribution in the area of distribution theory, relia-
bility theory, Bayesian inference, Markov chain Monte
Carlo techniques, parametric inference, design new
censoring plans, generating new families and R pro-
gramming software, etc.

2 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

To become a perfect statistician

Try to be a perfect programmer

▶ Dr. Ahmed Elshahhat

3 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

4 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

2 Data Structures

4 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

4 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

4 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

R is a powerful programming environment that provides a scripting language for


data handling, data visualization, and statistics with excellent graphical support.
This workshop will give you the basic tools to start exploring R environment and
all it has to can offer by yourself. This repository contains teaching materials for
a 2-3 hour, hands-on workshop "R Programming Language for Data Analytics".

5 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Learning Objectives

1 An Overview of R Programming Language:


What is R?
R Advantages
R Restrictions
Installing R System
R Programming Tools
Top 10 R Programming Books, Courses, Online Resources

6 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Learning Objectives

1 An Overview of R Programming Language:


What is R?
R Advantages
R Restrictions
Installing R System
R Programming Tools
Top 10 R Programming Books, Courses, Online Resources
2 Data Structures:
Vectors
Matrices
Arrays

6 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Learning Objectives, Cont’d

3 R Statistics

7 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Learning Objectives, Cont’d

3 R Statistics
4 R Graphics:
Bar & Box
Histogram & Density
Heatmap
Pairs
QQ
3D

7 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Learning Objectives, Cont’d

3 R Statistics
4 R Graphics:
Bar & Box
Histogram & Density
Heatmap
Pairs
QQ
3D
5 Inference:
Parameter Estimation
Monte Carlo of Parameter Estimation
Linear Regression Models
Monte Carlo of Linear Regression Models

7 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Schedule, Dataset & Installation Requirements

Schedule:
Activity Time (in Minutes)
Overview 30
Data Structures 35
R Statistics 20
R Graphics 35
Inference 60
Dataset: All R scripts used in this workshop are available within
these slides.
Installation Requirements: Download the latest versions of R.

8 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

9 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

10 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Data Science: Python vs SAS vs R


Python is a general-purpose programming language, free and open
source which has become very popular in data science. It is easy to
learn and understand. It is used by many biggies like Google,
Quora, Reddit, etc.
SAS has been proved as one of the unchallenged leaders in the field
of data science. It is also easy to learn. But, it is not open-source
and ends up being an expensive option for a beginner. So it is used
by various IT companies like Nestle, Barclays, Volvo, and HSBC.
R is a quite popular language for statistics. It is a counterpart of SAS
and is free as it is an open-source platform. It is mainly used in the
academics and research section.
So, the question is not which one to choose, but how to make the best
use of these programming languages for your specific use cases.
More details can be found on the Mindmajix training platform: https://mindmajix.com/python- vs- sas- vs- r

11 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

On July 2016, Burtch Works’ HR department asked over 1,000 quantitative


professionals which language they preferred, SAS, R or Python.

SAS is an expensive commercial software and is mostly used by large


corporations with huge budgets while Python and R are free software that
can be downloaded by anyone.
More details can be found on the Edvancer website: https://edvancer.in/r- python- or- sas- which- one- should- you- learn- first/

12 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
What is R?

1 R is a programming language for data analysis and graphics.


2 The R project was initially written by Ross Ihaka and Robert
Gentleman at Department of Statistics of University of Auckland,
New Zealand during 1990s and has been developed with
contributions from all over the world since mid-1997.
3 All information about R is found on http://www.R-project.org
4 R system contains two major components:
1. Base system – contains the R language software.
2. User contributed add-on packages.

13 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
R Advantages

14 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
R Advantages

15 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
R Advantages

1 R is open source.
2 R has a wide community.
3 Outstanding graphical outputs.
4 R is easy to learn and understand.
5 More than 18,000 packages are available and free.
6 R is good for MacOS, Linux and Microsoft Windows.
7 R is cross-platform which runs on many operating systems.
8 R is excellent for simulation, programming, computer intensive
analyses, etc.
9 In R, anyone is welcome to provide bug fixes, code enhancements,
and add new packages.
10 Knowledge support for any base default without internet connection.

16 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
R Restrictions

1 In R, need step minimal learning level.


2 In R, any one may be make mistakes and not know.
3 In R, quality of some packages is less than perfect.
4 In R, no commercial support.
5 In R, no one to complain if something doesn’t work.
6 In R, working with large data sets is limited by RAM.
7 R is a software application that many people devote their own time to
developing.
8 R commands give little thought to memory management.
9 R can consume all available memory.
10 Data preparation, organizing can be messier, more mistake prone in
R vs. SPSS or SAS.

17 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing R System

1 Go to the official site of R, https://www.r-project.org/.


2 Click on CRAN link on the left sidebar.
3 Select a mirror.
4 Choose your computer from the list (Linux, MacOS X, or Windows).
5 Click on the link that downloads the base distribution.
6 Run the file and follow the steps in the instructions to install R.
7 Save R. Have fun!

18 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
R Installation

19 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing R System

20 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing R System

21 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing R System

22 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing R System

23 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Overview
Installing add-on Packages

1 All packages are available on: https://cloud.r-project.org/web/packages/


2 Pick package from list and download
3 To install add-on package:
1. install.packages("package name")
2. library("package name")
4 Verify the package is installed by
any(grepl("package name",installed.packages()))

24 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Session
R Console

R Console: Outputs "usually unsaved".

25 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Session
R Editor

R Editor (File+New script): input scripts and saved as tex document.

26 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Session
Interactive R sessions

27 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Programming Tools
Arithmetic Operators

28 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Programming Tools
Commonly Functions

table counts
c concatenate
print show value
which TRUE indices
length no. of values
summary generic stats
dim matrix order
min, max minimum, maximum
help(), ? provide informations
rbind, cbind bind vectors as a row, a column
class type of an argument
apply repeat over rows, columns
sort, order, rank sort, order, vector rank
29 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Programming Tools
Commonly Functions, Cont’d

mean(x) average
var(x) variance
cor(x) correlation
cov(x) covariance
sqrt(x) square root
log10(x) log base 10
sin(x), cos(x), tan(x) linear algebra
log(x) natural logarithm
seq(x) sequence generation
median(x) middle number in a sorted
mad(x) median absolute deviation
d, p, q, r density, probability, quantile, generating rns functions
30 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Programming Tools
Probability Distribution Functions

Beta pbeta, qbeta, dbeta, rbeta


Binomial pbinom, qbinom, dbinom, rbinom
Cauchy pcauchy, qcauchy, dcauchy, rcauchy
Chi-Square pchisq, qchisq, dchisq, rchisq
Exponential pexp, qexp, dexp, rexp
F pf, qf, df, rf
Gamma pgamma, qgamma, dgamma, rgamma
Geometric pgeom, qgeom, dgeom, rgeom
Hypergeometric phyper, qhyper, dhyper, rhyper
Logistic plogis, qlogis, dlogis, rlogis

31 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

R Programming Tools
Probability Distribution Functions, Cont’d

Log-Normal plnorm, qlnorm, dlnorm, rlnorm


Negative Binomial pnbinom, qnbinom, dnbinom, rnbinom
Normal pnorm, qnorm, dnorm, rnorm
Poisson ppois, qpois, dpois, rpois
Student-t pt, qt, dt, rt
Uniform punif, qunif, dunif, runif
Studentized Range ptukey, qtukey, dtukey, rtukey
Weibull pweibull, qweibull, dweibull, rweibull
Wilcoxon’s Rank pwilcox, qwilcox, dwilcox, rwilcox

For more details about R Functions (+ Examples) see; https://statisticsglobe.com/

32 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#1 R in Action

For details see Kabacoff (2015).

33 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.
#2 R for Data Science

For details see Wickham and Grolemund (2016).

34 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.
#3 The Art of R Programming

For details see Matloff (2011).

35 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.
#4 Hands-On Programming with R

For details see Grolemund (2014).

36 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.
#5 R Graphics Cookbook

For details see Chang (2018).

37 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#6 The Big R-Book

For details see De Brouwer (2020).

38 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#7 Practical Data Science with R

For details see Mount and Zumel (2019).

39 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#8 R for Everyone

For details see Lander (2014).

40 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#9 The Book of R

For details see Davies (2016).

41 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Books


The book listed first does not have to be better than others. They are all
deserving of inclusion on the list, in our opinion.

#10 The R Book

For details see Crawley (2012).

42 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Data Analysis with R Programming (Google)

https://www.coursera.org/learn/data- analysis- r

43 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

R Programming Fundamentals (Stanford University)

https://online.stanford.edu/courses/xfds112- r- programming- fundamentals

44 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Data Science: R Basics (Harvard University)

https://pll.harvard.edu/course/data- science- r- basics?delta=0

45 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Data Analysis with R (Facebook)

https://www.facebook.com/groups/2101100100212657/

46 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

The Analytics Edge (Massachusetts Institute of Technology)

https://www.classcentral.com/course/edx- the- analytics- edge- 1623

47 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Introduction to R (DataCamp)

https://www.datacamp.com/courses/free- introduction- to- r

48 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Swirl: Learn R

https://swirlstats.com/

49 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Introduction to Business Analytics with R (University of Illinois)

https://www.coursera.org/learn/business- analytics- r

50 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

Introduction to Probability and Data with R (Duke University)

https://www.coursera.org/learn/probability- intro

51 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Courses


The best online courses to learn R programming, the language used by
data analysts and statisticians to structure, analyze, and visualize data.

R Programming A-Z (Udemy)

https://www.udemy.com/course/r- programming

52 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#1 R-bloggers

Note: The R-bloggers website comprises the efforts of more than 750 R bloggers.
https://www.r- bloggers.com/

53 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#2 Microsoft R Application Network (Revolution R Open)

Note: In 2015, Microsoft acquired Inside-R’s parent company Revolution Analytics. One result of this
acquisition is the Microsoft R Application Network, (MRAN).
https://mran.microsoft.com/

54 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#3 Quick-R

Note: Professor Rob Kabacoff at Wesleyan University created this website to introduce you to R and its
applications.
https://www.statmethods.net/

55 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#4 RStudio

Note: RStudio is an online learning page that links to tutorials and examples to help you master R and
related tools.
https://www.rstudio.com/

56 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#5 Statistics Globe

Note: Statistics Globe is an education platform that provides free programming tutorials in R and Python
as well as theoretical explanations for the field of statistics and data science.
https://statisticsglobe.com/

57 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources

In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#6 Stack Overflow

Note: Stack Overflow is a multimillion-member community of programmers dedicated to helping each


other. You can search their Q&A base for help with a problem, or you can ask a question.
https://stackoverflow.com/

58 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#7 R Tutorial

Note: R tutorial is designed for software programmers, statisticians and data miners who are looking
forward for developing statistical software using R programming.
https://www.tutorialspoint.com/r/index.htm

59 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#8 R Programming Tutorial

Note: R Programming Tutorial is designed for both beginners and professionals. Our tutorial provides all
the basic and advanced concepts of data analysis and visualization.
https://www.javatpoint.com/r- tutorial

60 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources

In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#9 RDocumentation

Note: RDocumentation enables you to search for R packages and functions that suit your needs.
https://www.rdocumentation.org/

61 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview What is R?
Data Structures R Restrictions
R Statistics R Installation
R Graphics R Programming Tools
Inference Top 10 R Programming Books, Courses, Online Resources

Top 10 R Programming Online Resources


In R, you might run into a situation or two that requires some expert help.
The websites listed can provide the assistance you need.

#10 R Manuals

Note: If you want to go directly to the source, visit the R manuals page.
https://cran.r- project.org/manuals.html

62 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

63 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Types
In R, there are 6 basic data types called: logical, numeric, integer,
complex, character and raw.

64 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Types

Data Types

print("abc") # Character
[1] "abc"

print (5) # Integer


[1] 5

print(c(10 ,20)) # Numeric


[1] 10 20

65 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Types

Data Types, Cont’d

print(TRUE) # Logical
[1] TRUE

print (2+3i) # Complex


[1] 2+3i

print( charToRaw (’hello ’)) # Raw


[1] 68 65 6c 6c 6f

Note: charToRaw() command converts each character to an American Standard Code for Information
Interchange (ASCII) value.

66 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
R has a wide variety of data types including factors, matrices, vectors,
arrays, data frames, and lists.

67 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

A vector is the basic data structure in R that stores data of six types of
data such as logical, integer, double, complex, character and raw.
Vectors

x <- 5:15; print (x) # Sequence from 5 to 15


[1] 5 6 7 8 9 10 11 12 13 14 15

x <- seq (5 ,15); print(x) # Sequence from 5 to 15


[1] 5 6 7 8 9 10 11 12 13 14 15

x <- 5.5:12.5; print(x) # Sequence from 5.5 to 12.5


[1] 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5

68 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d

x <- 5.5:13; print (x) # Sequence 5.5 to 13


[1] 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5

x <- seq (2,6,by =0.5); print(x) # Sequence 2 to 6 by 0.5


[1] 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

x <- seq (2 ,6 ,0.5); print(x) # Sequence 2 to 6 by 0.5


[1] 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

69 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d

x <- c(1 ,2 ,3 ,4 ,5); y <- c(6 ,7 ,8 ,9 ,10) # Vectors x and y


x; y
[1] 1 2 3 4 5
[1] 6 7 8 9 10

x[2]; y[2] # Access 2nd element in x and y


[1] 2
[1] 7

x[ -2]; y[ -2] # Exclude 2nd element in x and y


[1] 1 3 4 5
[1] 6 8 9 10

70 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d

x <- c(1 ,2 ,3 ,4 ,5); y <- c(6 ,7 ,8 ,9 ,10) # Vectors x and y


x; y
[1] 1 2 3 4 5
[1] 6 7 8 9 10

x[c(1 ,5) ]; y[c(1 ,5)] # Access (1st ,5th) items in x,y


[1] 1 5
[1] 6 10

x[-c(1 ,5) ]; y[-c(1 ,5)] # Exclude (1st ,5th) items in x,y


[1] 2 3 4
[1] 7 8 9

71 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
x; y
[1] 1 2 3 4 5
[1] 6 7 8 9 10

x[1]= -1; x # Replace 1st item in x


[1] -1 2 3 4 5
y[c(1 ,5) ]=c(10 ,100); y # Replace (1st ,5th) items in y
[1] 10 7 8 9 100

A=x+y; A # Vector addition


[1] 9 9 11 13 105
B=x-y; B # Vector subtraction
[1] -11 -5 -5 -5 -95

72 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
x;y
[1] 1 2 3 4 5
[1] 6 7 8 9 10

x+5; y-6 # Add 5 to x; subtract 6 from y


[1] 6 7 8 9 10
[1] 0 1 2 3 4

x*5; y/2 # Multiply 5 in x; divide y by 2


[1] 5 10 15 20 25
[1] 3.0 3.5 4.0 4.5 5.0

73 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
x;y
[1] 1 2 3 4 5
[1] 6 7 8 9 10

x[x <5]; x[x >=2]; x[x <1] # Access elements from x


[1] 1 2 3 4
[1] 2 3 4 5
numeric (0)

y^2; sqrt(y) # Get y^2; square root of x


[1] 36 49 64 81 100
[1] 2.449490 2.645751 2.828427 3.000000 3.162278

74 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
data <- rep(c(2 ,4 ,6) , times =3)
print(data) # Repeat vector 3 times
[1] 2 4 6 2 4 6 2 4 6

data <- rep(c(2 ,4 ,6) , each =3)


print(data) # Repeat each item 3 times
[1] 2 2 2 4 4 4 6 6 6

data <- rep(seq (1 ,3 ,0.5) , times =2)


print(data) # Repeat sequence 2 times
[1] 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0

75 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
for (i in seq (1 ,3 ,0.5)) {
print(i) # Sequence 1 to 3 by 0.5 separately
}
[1] 1
[1] 1.5
[1] 2
[1] 2.5
[1] 3

76 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
data <- c(1 ,2 ,3 ,4 ,5 ,6)

for (i in data) {
if (i %% 2 == 0)
print (i) # Print even integers
}
[1] 2
[1] 4
[1] 6

77 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Vectors

Vectors, Cont’d
data <- c(1 ,2 ,3 ,4 ,5 ,6)

for (i in data) {
if (i %% 2 == 1)
print (i) # Print odd integers
}
[1] 1
[1] 3
[1] 5

78 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

A matrix is a two-dimensional data structure where data are arranged


into rows and columns. In R, the basic syntax for creating a matrix is
matrix() function as
matrix (x, nrow , ncol , byrow ) # Insert matrix
x - data items of same type
nrow - number of rows
ncol - number of columns
byrow (optional) - if TRUE, the matrix is filled row-wise.
By default, the matrix is filled column-wise.

79 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices
x = c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)
A = matrix (x ,3 ,3); print(A) # Create a 3x3 matrix
[,1] [ ,2] [ ,3]
[1,] 9 3 6
[2,] 1 4 7
[3,] 2 5 8

A = matrix (x,nrow =3, ncol =3); print(A)


[,1] [ ,2] [ ,3]
[1,] 9 3 6
[2,] 1 4 7
[3,] 2 5 8

80 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d
A = matrix (c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8) ,nrow =3, ncol =3); print(A)
[,1] [ ,2] [ ,3]
[1,] 9 3 6
[2,] 1 4 7
[3,] 2 5 8

dim(A) # Dimension of A
[1] 3 3

det(A) # Determinant of A
[1] -27

diag(A) # Diagonal elements of A


[1] 9 4 8

81 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d

sum(diag(A)) # Trace of A
[1] 21

A[ ,1]; A[1 ,] # Access 1st column ; 1st row of A


[1] 9 1 2
[1] 9 3 6

A[2 ,2] # Access (2nd column ,2nd row) element


[1] 4

82 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d
A[ ,1]; A[1 ,]
[1] 9 1 2
[1] 9 3 6

matrix (A[ ,1] ,3 ,1) # Access 1st column as matrix


[,1]
[1,] 9
[2,] 1
[3,] 2

matrix (A[1 ,] ,1 ,3) # Access 1st row as matrix


[,1] [ ,2] [ ,3]
[1,] 9 3 6

83 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d
cbind(A[1 ,]) # Transpose 1st row to a column
[,1]
[1,] 9
[2,] 3
[3,] 6

rbind(A[ ,1]) # Transpose 1st column to a row


[,1] [ ,2] [ ,3]
[1,] 9 1 2

A[ ,1]=c(2 ,0 ,2); A # Replace 1st column by other items


[,1] [ ,2] [ ,3]
[1,] 2 3 6
[2,] 0 4 7
[3,] 2 5 8
84 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d
colSums (A); rowSums (A) # Sum columns ; Sum rows of A
[1] 12 12 21
[1] 18 12 15

t(A) # Transpose A
[,1] [ ,2] [ ,3]
[1,] 9 1 2
[2,] 3 4 5
[3,] 6 7 8

solve(A) # Inverse matrix of A


[ ,1] [,2] [,3]
[1,] 0.1111111 -0.2222222 0.1111111
[2,] -0.2222222 -2.2222222 2.1111111
[3,] 0.1111111 1.4444444 -1.2222222
85 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d
x = c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8) # Data x
y = c(0 ,2 ,4 ,6 ,8 ,10 ,12 ,14 ,16) # Data y

A = matrix (x,nrow =3, ncol =3); A # Matrix A


[,1] [ ,2] [ ,3]
[1,] 9 3 6
[2,] 1 4 7
[3,] 2 5 8

B = matrix (y,nrow =3, ncol =3); B # Matrix B


[,1] [ ,2] [ ,3]
[1,] 0 6 12
[2,] 2 8 14
[3,] 4 10 16

86 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d

Z=A+B; Z # Add matrix A to B


[,1] [ ,2] [ ,3]
[1,] 9 9 18
[2,] 3 12 21
[3,] 6 15 24

Z=A-B; Z # Subtract matrix A from B


[,1] [ ,2] [ ,3]
[1,] 9 -3 -6
[2,] -1 -4 -7
[3,] -2 -5 -8

87 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d

Z=A/B; Z # Divide matrix A by B


[,1] [ ,2] [ ,3]
[1,] Inf 0.5 0.5
[2,] 0.5 0.5 0.5
[3,] 0.5 0.5 0.5

88 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Matrices

Matrices, Cont’d

Z=A*B; Z # Multiply matrices A and B correspondingly


[,1] [ ,2] [ ,3]
[1,] 0 18 72
[2,] 2 32 98
[3,] 8 50 128

Z=A%*%B; Z # Multiply matrices A and B by rule


[,1] [ ,2] [ ,3]
[1,] 30 138 246
[2,] 36 108 180
[3,] 42 132 222

89 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Array is a data structure which can store data of the same type in more
than two dimensions. In R, the basic syntax for creating an array is
array() function as
array(x, dim = c(nrow , ncol , nmat)) # Insert array
x - data items of same type
nrow - number of rows
ncol - number of columns
nmat - number of matrices

90 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays

A=array(c (1:12) , dim = c(2 ,3 ,2))


print(A) # Create two matrices 2x3
, , 1

[,1] [ ,2] [ ,3]


[1,] 1 3 5
[2,] 2 4 6

, , 2

[,1] [ ,2] [ ,3]


[1,] 7 9 11
[2,] 8 10 12

91 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d

A[,,2] # Access matrix #2


[,1] [ ,2] [ ,3]
[1,] 7 9 11
[2,] 8 10 12

A[,3,1] # Access 3rd column in matrix #1


[1] 5 6

A[2 ,3 ,2] # Access 2nd item of 3rd col. in matrix #2


[1] 12

12% in%A[,,2] # Check element 12 in matrix #2


[1] TRUE

92 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d
A1 <- matrix (c (1:6) , 2, 3, byrow = TRUE) # Matrix A1
A2 <- matrix (c( -1: -6) , 2, 3, byrow = TRUE) # Matrix A2
col.names <- c("COL1","COL2","COL3") # Col.names
row.names <- c("ROW1","ROW2") # Row.names
mat.names <- c(" Matrix1 "," Matrix2 ") # Matrix .names

Array <- array (c(A1 ,A2),dim = c(2 ,3 ,2) ,


dimnames = list(row.names ,col.names ,mat.names)
) # Set A1 and A2 to array
print(Array )

93 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d
, , Matrix1

COL1 COL2 COL3


ROW1 1 2 3
ROW2 4 5 6

, , Matrix2

COL1 COL2 COL3


ROW1 -1 -2 -3
ROW2 -4 -5 -6

94 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d
Mat <- Array [,,1] + Array [,,2] # Add arrays
print(Mat)
COL1 COL2 COL3
ROW1 0 0 0
ROW2 0 0 0

Transpose .1 <- t( Array [,,1]) # Transpose 1st array


print( Transpose .1)
ROW1 ROW2
COL1 1 4
COL2 2 5
COL3 3 6

95 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d

Res .1 <- apply (Array ,c(1) ,sum) # Sum rows in arrays


print(Res .1)
ROW1 ROW2
0 0

Res .2 <- apply (Array ,c(2) ,sum) # Sum columns in arrays


print(Res .2)
COL1 COL2 COL3
0 0 0

96 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d

Res .3 <- apply (Array ,c(3) ,sum) # Sum all items in array
print(Res .3)
Matrix1 Matrix2
21 -21

Res .4 <- sum( Array ) # Sum all items in arrays


print(Res .4)
[1] 0

97 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures Vectors
R Statistics Matrices
R Graphics Arrays
Inference

R Data Structures
Arrays

Arrays, Cont’d

# Multiply arrays correspondingly


Res .5 <- Array [,,1]*Array [,,2]
print(Res .5)
COL1 COL2 COL3
ROW1 -1 -4 -9
ROW2 -16 -25 -36
# Multiply arrays in rule
Res .6 <- Array [,,1]%*%t(Array [,,2])
print(Res .6)
ROW1 ROW2
ROW1 -14 -32
ROW2 -32 -77

98 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

99 / 172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics


Overview
Data Structures
R Statistics
R Graphics
Inference

Statistics
Statistics
set.seed (1234) # Set seed for reproducibility

x <- rnorm (1000 ,0 ,1) # Generate random sample

print(mean(x)) # Mean
[1] -0.0265972

print( median (x)) # Median


[1] -0.0397941

print(range (x)) # Range of x


[1] -3.396064 3.195901

Note: Set.seed() function helps to reuse the same set of random variables when the same results of
randomization cannot be imported in the future.

100 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Data Structures
R Statistics
R Graphics
Inference

Statistics

Statistics, Cont’d

print( quantile (x ,0.75) ) # 3rd Quartile


75%
0.6158186

print(mad(x)) # Median absolute deviation


[1] 0.9522307

print(var(x)) # Variance
[1] 0.9946825

print(sd(x)) # Standard deviation


[1] 0.9973377

101 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Data Structures
R Statistics
R Graphics
Inference

Statistics

Statistics, Cont’d

print(min(x)) # Minimum value


[1] -3.396064

print(max(x)) # Maximum value


[1] 3.1959011

print( summary (x)) # Statistics summary


Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.39606 -0.67325 -0.03979 -0.02660 0.61582 3.19590

102 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

103 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

104 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Simple Plot #1

set.seed (123)
x <- rnorm (500) # Generate sample x from N(0 ,1)
y <- x + rnorm (500) # Generate sample y
plot(x, y) # Plot samples x and y

104 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Simple Plot #1

105 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Suppose we have different functions namely x, y, z1 and z2 formulated


respectively as:

x = rnorm(500)
y = x + rnorm(100)
z1 = x − 2y + 100
z1
e 100
z2 = (z1 + 5)
log(z1 )

106 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Simple Plot #2

set.seed (123)
x <- rnorm (500)
y <- x + rnorm (100)
z1 <- x - 2*y + 100
z2 <- (z1 +5)*(exp(z1/100)/log(z1))
plot(z1 , z2 , lwd = 3, col = "coral",
xlab = expression (z[1]) , ylab = expression (z[2]) ,
main = expression (
frac ((z [1]+5) *e^frac(z[1] ,100) ,log(z[1]))
)
)
# Plot sample z1 and z2

107 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Simple Plot #2

108 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Suppose we have two different sequences x1 and x2 as

x1 = 1 : 10
x2 = 1 : 10

Simple Plot #3

par(mfrow = c(2, 3))


plot(x1 , x2 , type = "l", main = "type=’l’", lwd = 6)
plot(x1 , x2 , type = "s", main = "type=’s’", lwd = 5)
plot(x1 , x2 , type = "p", main = "type=’p’", lwd = 4)
plot(x1 , x2 , type = "l", main = "type=’o’", lwd = 3)
plot(x1 , x2 , type = "s", main = "type=’b’", lwd = 2)
plot(x1 , x2 , type = "h", main = "type=’h’", lwd = 1)

109 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Simple Plot #3

110 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Barplot

A barplot (or barchart; bargraph) illustrates the association between a


numeric and a categorical variable.

x = rnorm(50)
y = x + rnorm(50)

Barplot Plot

barplot (x, col =2) # Draw barplot


For more details see; https://statisticsglobe.com/barplot- in- r

111 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Barplot

112 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Boxplot

A boxplot (or box-and-whisker plot) displays the distribution of a


numerical variable based on different statistics namely: minimum
non-outlier; first-quartile; median; third-quartile; and maximum
non-outlier.

x = rnorm(50)
y = x + rnorm(50)

Boxplot Plot

boxplot (x, col =2) # Draw boxplot


For more details see; https://statisticsglobe.com/boxplot- in- r

113 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Boxplot

114 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Histogram

A histogram represents the frequencies of values of a variable bucketed


into ranges. The height of each bar shows the amount of observations
within each range.
x = rnorm(50)
y = x + rnorm(50)

Histogram Plot
set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
par(mfrow =c(1 ,2) , oma=c(0 ,0 ,0 ,0))
hist(x)
hist(y) # Draw histograms of x & y in one row
For more details see; https://www.tutorialspoint.com/r/r_histograms.htm
115 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Histogram

116 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Density Plot

A density (kernel density or density trace) plot shows the distribution of a


numerical variable over a continuous interval.

x = rnorm(50)
y = x + rnorm(50)

Density Plot

set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
plot( density (x))
polygon ( density (x), col = 1) # Draw density
For more details see; https://statisticsglobe.com/kernel- density- plot- in- base- r
117 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Density Plot

118 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Histogram & Density Plot

A plot of histogram with Gaussian line in R.

x = rnorm(50)
y = x + rnorm(50)

Histogram & Density Plot

set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
hist(x, prob = TRUE) # Draw histogram and density
lines( density (x), lwd =3, col = "red")
For more details see; https://statisticsglobe.com/kernel- density- plot- in- base- r

119 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Histogram & Density Plot

120 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
A general plot represents the scatter, bar, box, time series, time-based
and a specified function in 2×3 window.

General Plot
set.seed (123)
x <- rnorm (500)
y <- x + rnorm (500)
Data_1 <- ts( matrix (x, nrow =500 , ncol =1) , start=c(0 ,1) ,
frequency =12)
Data_2 <- seq(as.Date("2005/1/1"),by="month ",length =50)
Data_3 <- factor ( mtcars $cyl)
Data_4 <- function (x) x^2
Data_5 <- rnorm (32)
Data_6 <- rnorm (50)

The ts() function converts a numeric vector into a time series object.

121 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

General Plot, Cont’d


par( mfrow =c(2 ,3) , oma=c(0 ,0 ,0 ,0))
# Scatterplot
plot(x, y, main = " Scatterplot ")
# Barplot
plot(Data_3, main = " Barplot ",xlab="Data_3",col =2)
# Boxplot
plot(Data_3, Data_5, main=" Boxplot ", xlab="Data_3",
ylab="Data_5",col =3)
# Time series plot
plot(Data_1, main = "Time series ", col =4)
# Time - based plot
plot(Data_2, Data_6, main = "Time based plot",
xlab="Data_2", ylab="Data_6", col =6, lwd =3)
# Plot a specified function
plot(Data_4, 0, 10, main = expression (x^2) ,col =2, lwd =4)
For more details see; https://r- coder.com/plot- r/

122 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
General Plot, Cont’d

123 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Heatmap

A heatmap (or shading matrix) visualizes individual values of a matrix


with specified colors.
Heatmap
library ( ggplot2 ); library ( tidyr )
dir. create ("data")
dir. create (" output ")
download .file(url = " https :// tinyurl .com/mine -data -csv",
destfile <- "data/mine -data.csv")
mine.data <- read.csv(file = "data/mine -data.csv")
mine.long <- pivot _ longer (data = mine.data ,
cols = -c (1:3) , names _to = " Class ", values _to = "
Abundance ")
mine. heatmap <- ggplot (data = mine.long ,
mapping <- aes(x = Sample .name , y = Class , fill = Abundance
)) +
geom_tile () + xlab( label = " Sample ")
mine. heatmap # Draw heatmap
124 /
For more details see; https://jcoliver.github.io/
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

Heatmap, Cont’d
The data set in excel file is

125 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Heatmap, Cont’d

126 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

A pairs plot is a plot of a matrix consisting of scatterplots for each


variable-combination of a data frame.
set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
pairs(data. frame (x, y)) # Draw pairs

For more details see; https://statisticsglobe.com/r- pairs- plot- example/

127 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

128 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

set.seed (123)
x1 <- rnorm (1000) # Create variable
x2 <- x1 + rnorm (1000 , 0, 2)
x3 <- 3 * x1 - x2 + rnorm (1000 ,0 ,4)
PR <- data. frame (x1 ,x2 ,x3)
pairs(PR) # Draw pairs

For more details see; https://statisticsglobe.com/r- pairs- plot- example/

129 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

130 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

Pairs Plot
set.seed (123)
library (" ggplot2 ")
library (" GGally ")
x1 <- rnorm (1000) # Create variable x1
x2 <- x1 + rnorm (1000 ,0 ,2) # Create variable x2
x3 <- 3*x1 -x2 + rnorm (1000 ,0 ,4) # Create variable x3
data <- data. frame (x1 ,x2 ,x3) # Combine all variables
ggpairs (data) # Apply ggpairs function
cor(x1 ,x2) # Correlation between x1 and x2
cor(x1 ,x2) # Correlation between x1 and x3
cor(x2 ,x3) # Correlation between x2 and x3
For more details see; https://statisticsglobe.com/r- pairs- plot- example/

131 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Pairs Plot

132 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
QQ Plot

A quantile-quantile (QQ) plot compares the empirical quantiles obtained


in the sample versus the quantiles calculated from a theoretical
distribution.
set.seed (123)
x <- rnorm (1000)
qqnorm (x) # Draw QQ plot
qqline (x, lwd =3, col = "red") # Add QQ line

For more details see; https://statisticsglobe.com/r- qqplot- qqnorm- qqline- function

133 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
QQ Plot

134 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
3D Plot

3D plot in R Language is used to add title, change viewing direction, and


add color and shade to the plot.
q
G= x2 + y2

3D Plot

G <- function (x,y) sqrt(x^2 + y^2)


x <- y <- seq(-1, 1, length = 30)
z <- outer (x, y, G)
persp(x, y, z, main="3D Plot", zlab = " Height ",theta =30,
phi =11 , col ="cyan", shade =0.3) # Draw 3D plot

For more details see; https://www.geeksforgeeks.org/creating- 3d- plots- in- r- programming- persp- function/

135 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots

3D Plot

136 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview Bar & Box
Data Structures Histogram & Density
R Statistics Heatmap
R Graphics Pairs
Inference QQ
3D

R Plots
Colors in R Plots

137 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Outline

1 Overview

2 Data Structures

3 R Statistics

4 R Graphics

5 Inference

138 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

139 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Inference

In statistics we often use a parametric probability model to describe the


behavior of an unknown parameter(s). The role of data in all of this is to
provide estimates of the parameters of the probability model. The three
most widely-used statistical estimation techniques are:

Maximum Likelihood Estimation


Least-Squares Estimation
Bayesian Estimation

Hint: It would be interesting to investigate Bayes MCMC methods, but


due to the time limit of this workshop this part of statistical inference will
be investigated later.

139 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Maximum Likelihood
MLE-Two dimensional

Here, the maximum likelihood estimation method is used to estimate the


parameter(s) of a target population (e.g., Weibull) given a sample.
MLE via maxLik function
set.seed (1234) # Set seed for reproducible
library ( maxLik ) # Load maxLik package
alpha <- 2 # Shape parameter value
lambda <- 1 # Scale parameter value
x <- rweibull (20 , alpha , lambda ) # Simulate random sample
n <- length (x) # No. of observations
LL <- function ( param ) { # Set log -lik function
alpha <- param [1]
lambda <- param [2]
logL <-sum(log( alpha * lambda *x^( alpha -1)*exp(- lambda *x^ alpha )))
}
fit <- maxLik (LL , start =c(alpha , lambda ))
For more details see Henningsen and Toomet (2011).

140 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Maximum Likelihood
MLE-Two dimensional

Here, the maximum likelihood estimation method is used to estimate the


parameter(s) of a target population (e.g., Weibull) given a sample.
MLE via maxLik function, Cont’d
summary (fit)
--------------------------------------------
Maximum Likelihood estimation
BFGS maximization , 12 iterations
Return code 0: successful convergence
Log - Likelihood : -10.36542
2 free parameters
Estimates :
Estimate Std. error t value Pr(> t)
[1 ,] 2.2905 0.3798 6.032 1.62e -09 ***
[2 ,] 0.8986 0.2184 4.115 3.87e -05 ***
---
Signif . codes : 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

141 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Least-Squares
LSE-Two dimensional

Here, the least-squares estimation method is used to find the best fit for
the parameter(s) of a target population (e.g., Weibull) based on data set
by minimizing the sum of squares of differences between the theoretical
and empirical CDFs.
LSE
set.seed (1234) # Set seed for reproducible
alpha <- 2 # Shape parameter value
lambda <- 1 # Scale parameter value
start <- c(alpha , lambda ) # Start value
x <- rweibull (20 , alpha , lambda ) # Simulate random sample
n <- length (x) # No. of observations
lower <- c(0 ,0); upper <- c(+Inf ,+ Inf)
Dweibull <- function (y, param ) { # Weibull distribution
alpha <- param [1]
lambda <- param [2]
res <- 1-exp(- lambda *y^ alpha )
}
142 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Least-Squares
LSE-Two dimensional

LSE, Cont’d
LSE <- function (param ,y,CDF) { # Set an objective function
D <- rep (0, l = n)
for(i in 1:n) {
D[i] <- (CDF(y[i], param ) -(i/(n+1)))^2
}
sum(D)
}

OLS <- function (CDF ,start ,data ,lim_inf ,lim_sup) {


max <- nlminb ( start = start , objective = LSE , y = y, CDF = CDF ,
lower = lim_inf , upper = lim_sup)
return (max$ param )
}
For more details see Swain et al. (1988).

143 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Least-Squares
LSE-Two dimensional

LSE, Cont’d
OLS_ weibull =OLS(Dweibull ,start ,y,lower , upper )
print (OLS_ weibull )

[1] 2.1672734 0.9257518

summary (OLS_ weibull )


Min. 1st Qu. Median Mean 3rd Qu. Max.
0.9258 1.2361 1.5465 1.5465 1.8569 2.1673

144 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo simulation is a computerized mathematical technique to generate


random sample data based on some known distribution for numerical experi-
ments. This method is applied to risk quantitative analysis and decision mak-
ing problems. So, this method was first used by scientists working on the atom
bomb in 1940. Therefore, this method was used by the professionals of various
fields such as finance, project management, energy, manufacturing, engineer-
ing, research and development, insurance, oil and gas, transportation, etc.

145 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Introduction

Monte Carlo Simulation


1 The main properties of Monte-Carlo method are:
Its depend on generate random samples.
Its input distribution must be known.
Its result must be known while performing an experiment.

146 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Introduction

Monte Carlo Simulation, Cont’d


1 The main advantages of Monte-Carlo method are:
Easy to implement.
Provides statistical sampling for numerical experiments.
Provides approximate solution to complex mathematical problems.
1 The main disadvantages of Monte-Carlo method are:
Time consuming to get the desired output.
Its results are only the approximation of true values, not the exact.

147 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation

Now, we’ll discuss the R script of drawing a Monte Carlo simulation of


Weibull parameters based on complete sampling using two classical
methods of estimation are:

Maximum Likelihood Estimation


Least-Squares Estimation

148 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Maximum Likelihood

Step #1 - Determine the inputs


set.seed (1234) # Put set.seed
library ( maxLik ) # Run maxLik package
alpha = 0.7; lambda = 0.4 # Set true parameter values
N = 10000; n = 50 # Set no. of replications ; sample size
w = x = matrix (0,N,n) # Matrix of generated samples
Est = matrix (0,N ,2) # Matrix of simulation outputs
for(i in 1:N){
w[i ,] <- matrix (( runif (n ,0 ,1)),n ,1)
x[i ,] <- (( -1/ lambda )*log (1-w[i ,]))^(1/ alpha )
LL <- function ( param ) { # Set log -lik function
alpha <- param [1]
lambda <- param [2]
logL <- sum(log(
alpha * lambda *x[i ,]^( alpha -1)*exp(- lambda *x[i ,]^ alpha ))
)
}
Est[i ,] <- maxLik (LL , start =c(alpha , lambda ))$est
149 / }
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Maximum Likelihood

Step #2 - Run Monte Carlo experiment and get outputs

MLE_1= mean(Est [ ,1]); MLE_2= mean(Est [ ,2])


MSE_1= mean (( Est [,1]- alpha )^2); MSE_2= mean (( Est [,2]- lambda )^2)
MAB_1= mean(abs(Est [,1]- alpha )); MAB_2= mean(abs(Est [,2]- lambda ))

Res_1=c(MLE_1,MSE_1,MAB_1); Res_1 # Av.Est;MSE;MAB of alpha


[1] 0.719777654 0.007196305 0.065652642

Res_2=c(MLE_2,MSE_2,MAB_2); Res_2 # Av.Est;MSE;MAB of lambda


[1] 0.398973150 0.006927434 0.066228848

150 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Least-Squares

Step #1 - Determine the inputs


set.seed (1234) # Put set.seed
alpha = 0.7; lambda = 0.4 # Set true parameter values
N = 10000 # Set no. of replications
n = 50 # Set sample size
w = x = matrix (0,N,n) # Matrix of generated samples
Est = matrix (0,N ,2) # Matrix of simulation outputs
lower = c(0 ,0); upper = c(+Inf ,+ Inf)

cdf_ weibull = function (x, param ) { # Weibull distribution


alpha = param [1]
lambda = param [2]
res = 1-exp(- lambda *x^ alpha )
return (res)
}

151 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Least-Squares

Step #1 - Determine the inputs, Cont’d


LSE = function (param ,x,cdf) { # Set an objective function
x = sort(x)
D = rep (0,l=n)
for(i in 1:(n)) {
D[i] = (cdf(x[i], param ) -(i/(n+1)))^2
}
sum(D)
}

OLS = function (cdf ,start ,x,lim_inf ,lim_sup) {


max = nlminb ( start =c(alpha , lambda ),objective =LSE ,x=x,cdf=cdf ,
lower =lim_inf , upper =lim_sup)
return (max$par)
}

152 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Least-Squares

Step #2 - Run Monte Carlo experiment and get outputs


for(i in 1:N){
w[i ,] <- matrix (( runif (n ,0 ,1)),n ,1)
x[i ,] <- sort ((( -1/ lambda )*log (1-w[i ,]))^(1/ alpha ))
Est[i ,] <- OLS(cdf=cdf_weibull , start =c(alpha , lambda ),x[i,],
lim_inf=lower , lim_sup= upper )
}

LSE_1= mean(Est [ ,1]); LSE_2= mean(Est [ ,2])


MSE_1= mean (( Est [,1]- alpha )^2); MSE_2= mean (( Est [,2]- lambda )^2)
MAB_1= mean(abs(Est [,1]- alpha )); MAB_2= mean(abs(Est [,2]- lambda ))

Res_1=c(LSE_1,MSE_1,MAB_1); Res_1 # Av.Est;MSE;MAB of alpha


[1] 0.702144763 0.009249139 0.075082418

Res_2=c(LSE_2,MSE_2,MAB_2); Res_2 # Av.Est;MSE;MAB of lambda


[1] 0.406889288 0.007449279 0.068068669

153 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Regression analysis is a very important statistical method to establish a


relationship model between two variables. One of these variable is called
predictor variable whose value is gathered through experiments. The
other variable is called response variable whose value is derived from the
predictor variable.
Simple Linear Regression
The general mathematical expresion for a simple linear regression is:
y = ax + b

y - response variable.
x - predictor variable.
a & b - regression coefficients.
154 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Simple Linear Regression, Cont’d


In R, the lm() function used to creates the relationship model between the
predictor and the response variable. In R, the basic syntax for lm()
function in the case of simple linear regression is
lm(formula ,data)

formula - is a an expression presenting the relation between x and y.


data - is the vector on which the formula will be applied.

155 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Simple Linear Regression, Cont’d


The following example represents the relationship between the tall ( x)
and goal success percentage (y). Then, the simple linear regression
model between x & y is fitted as:
y<-c(0.63 ,0.81 ,0.56 ,0.91 ,0.47 ,0.57 ,0.76 ,0.72 ,0.62 ,0.48)
x<-c(1.51 ,1.74 ,1.38 ,1.86 ,1.28 ,1.36 ,1.79 ,1.63 ,1.52 ,1.31)

mydata <- data. frame(y=y,x=x)


model <- lm(y~x, data = mydata ) # Apply lm() function
print( summary ( model)) # Call fit results

156 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Simple Linear Regression, Cont’d


Call:
lm( formula = y ~ x)

Residuals :
Min 1Q Median 3Q Max
-0.063002 -0.016629 0.000412 0.018944 0.039775

Coefficients :
Estimate Std. Error t value Pr(>|t|)
( Intercept ) -0.38455 0.08049 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e -06 ***
---
Signif . codes : 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error : 0.03253 on 8 degrees of freedom


Multiple R- squared : 0.9548 , Adjusted R- squared : 0.9491
F- statistic : 168.9 on 1 and 8 DF , p- value : 1.164e -06
157 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Simple Linear Regression, Cont’d


print(model ) # Call the fitted model

Call:
lm( formula = y ~ x, data = mydata )

Coefficients :
( Intercept ) x
-0.3846 0.6746

a <- data. frame (x = 1.75)


pred <- predict (model ,a) # predict y if x is 1.75
print(pred)
1
0.7960174
158 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Simple

Simple Linear Regression, Cont’d


plot(y,x, abline (lm(x~y)),cex = 1.3 , pch = 16, col = "blue",lwd =3,
main = "Tall & Goal Success Regression ",
xlab = "Tall in Meter ",
ylab = "Goal Success Percentage ") # Plot the fitted model

159 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Multiple

Multiple regression is an extension of linear regression into relationship


between more than two variables. In simple linear relation we have one
predictor and one response variable, but in multiple regression we have
more than one predictor variable and one response variable.
Multiple Linear Regression
The general mathematical expresion for a multiple linear regression is:
y = a + b1 x1 + b2 x2 + · · · + bn xn

y - response variable.
x1 , x2 , ..., xn - predictor variables.
a, b1 , b2 , ..., bn - regression coefficients.

160 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Multiple

Multiple Linear Regression, Cont’d


In R, the lm() function used to creates the relationship model between the
predictors and the response variable. In R, the basic syntax for lm()
function in the case of multiple linear regression is
lm(formula ,data)

formula - is an expression presenting the relation between the


response variable and predictor variables.
data - is the vector on which the formula will be applied.

161 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Multiple

Multiple Linear Regression, Cont’d


The following example represents the relationship between the tall ( x1 ),
age ( x2 ) and goal success percentage (y). Then, the multiple linear
regression model between x1 , x2 & y is fitted as:
y <- c (0.63 ,0.81 ,0.56 ,0.91 ,0.47 ,0.57 ,0.76 ,0.72 ,0.62 ,0.48)
x1 <- c (1.51 ,1.74 ,1.38 ,1.86 ,1.28 ,1.36 ,1.79 ,1.63 ,1.52 ,1.31)
x2 <- c(25 , 22, 19, 20, 28, 26, 29, 31, 25, 24 )

mydata <- data. frame (y=y,x1=x1 ,x2=x2)


model <- lm(y~x1+x2 , data = mydata ) # Apply lm () function

print ( summary ( model )) # Call fit results

162 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Multiple

Multiple Linear Regression, Cont’d


Call:
lm( formula = y ~ x1 + x2)
Residuals :
Min 1Q Median 3Q Max
-0.045299 -0.018143 -0.000696 0.018518 0.040738

Coefficients :
Estimate Std. Error t value Pr(>|t|)
( Intercept ) -0.277006 0.101562 -2.727 0.0294 *
x1 0.670162 0.047952 13.976 2.27e -06 ***
x2 -0.004044 0.002607 -1.551 0.1647
---
Signif . codes : 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error : 0.03 on 7 degrees of freedom


Multiple R- squared : 0.9664 , Adjusted R- squared : 0.9567
F- statistic : 100.5 on 2 and 7 DF , p- value : 6.988e -06
163 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Linear Regression
Multiple

Multiple Linear Regression, Cont’d


print ( model ) # Call the fitted model

Call:
lm( formula = y ~ x1 + x2 , data = mydata )

Coefficients :
( Intercept ) x1 x2
-0.277006 0.670162 -0.004044

a <- data. frame (x1 = 1.75 , x2 = 25)


pred <- predict (model ,a) # predict y if x1 is 1.75 and x2 is 25
print (pred)
1
0.7946699

164 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Linear Regression

Now, we’ll discuss the R script of drawing a Monte Carlo simulation of


both simple and multiple linear regression models.

165 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Simple Linear Regression

Monte Carlo Simulation of Simple Linear Regression


set.seed (1234) # Put set.seed
N = 10000 # Set no. of replications
n = 50; sd = 2 # Set sample size; St.d
beta_ 0=150; beta_1=4 # Set true values
Est = MSE = MAB = matrix (0,N ,2) # Set output matrices
linear = function (n, beta_0, beta_1) {# Set linear model
x1 = rnorm (n)
error = rnorm (n, 0, sd)
y = beta_0 + beta_1*x1 + error
mydata = data. frame (y, x1)
model = lm(y ~ x1 , data = mydata )
}
Res = replicate (N, linear (n,beta_0,beta_1) ,simplify =F)

166 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Simple Linear Regression

Monte Carlo Simulation of Simple Linear Regression, Cont’d


theta = c(beta_0,beta_1)

for(i in 1:N){
Est[i ,] = c(Res [[i]]$coef) # Calculate Av.Ests
MSE[i ,] = (theta -c(Res [[i]]$coef))^2 # Calculate MSEs
MAB[i ,] = abs(theta -c(Res [[i]]$coef))/ theta # Calculate MABs
}
Reg_1 = mean(Est [ ,1]); Reg_2 = mean(Est [ ,2])
MSE_1 = mean(MSE [ ,1]); MSE_2 = mean(MSE [ ,2])
MAB_1 = mean(MAB [ ,1]); MAB_2 = mean(MAB [ ,2])

Res_1 = c(Reg_1,MSE_1,MAB_1); Res_1


[1] 149.999998 0.08201119 0.00151946

Res_2 = c(Reg_2,MSE_2,MAB_2); Res_2


[1] 3.99619458 0.08534729 0.05788384

167 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Multiple Linear Regression

Monte Carlo Simulation of Multiple Linear Regression


set.seed (1234) # Put set.seed
N = 10000 # Set no. of replications
n = 50; sd = 2; ngroups = 2 # Set sample size; St.d
beta_0 = 5; beta_1 = -2; beta_2 = 4 # Set true parameter values
Est = MSE = MAB = matrix (0,N ,3) # Set output matrices
# Generate target lm
linear = function (n, beta_0, beta_1, beta_2) {
x = rnorm ( ngroups *n)
x1 = x[1:n]
x2 = x[c(n+1):c( ngroups *n)]
error = rnorm (n, 0, sd)
y = beta_0 + beta_1*x1 + beta_2*x2 + error
mydata = data. frame (y, x1 , x2)
model = lm(y ~ x1 + x2 , data = mydata )
}
Res = replicate (N, linear (n,beta_0,beta_1,beta_2) ,simplify =F)

168 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Multiple Linear Regression

Monte Carlo Simulation of Multiple Linear Regression, Cont’d


theta = c(beta_0,beta_1,beta_2)

for(i in 1:N){
Est[i ,] = c(Res [[i]]$coef)
MSE[i ,] = (theta -c(Res [[i]]$coef))^2
MAB[i ,] = abs(theta -c(Res [[i]]$coef))/ theta
}
# Calculate Av. Ests
Reg_1= mean(Est [ ,1]); Reg_2= mean(Est [ ,2]); Reg_3= mean(Est [ ,3])
# Calculate MSEs
MSE_1= mean(MSE [ ,1]); MSE_2= mean(MSE [ ,2]); MSE_3= mean(MSE [ ,3])
# Calculate MABs
MAB_1= mean(MAB [ ,1]); MAB_2= mean(MAB [ ,2]); MAB_3= mean(MAB [ ,3])

169 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Monte Carlo Simulation


Multiple Linear Regression

Monte Carlo Simulation of Multiple Linear Regression, Cont’d


Res_1 = c(Reg_1,MSE_1,MAB_1); Res_1
[1] 4.99598587 0.08406701 0.04635600

Res_2 = c(Reg_2,MSE_2,MAB_2); Res_2


[1] -2.0000555 0.0876486 -0.1174803

Res_3 = c(Reg_3,MSE_3,MAB_3); Res_3


[1] 3.99709386 0.08689668 0.05820485

170 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Chang, W. (2018). R Graphics Cookbook: Practical Recipes for


Visualizing Data. O’Reilly Media, Inc.
Crawley, M. J. (2012). The R Book. John Wiley and Sons.
Davies, T. M. (2016). The Book of R: A First Course in Programming and
Statistics. No Starch Press.
De Brouwer, P. J. (2020). The Big R-Book: From Data Science to
Learning Machines and Big Data. John Wiley & Sons.
Grolemund, G. (2014). Hands-on Programming with R: Write Your Own
Functions and Simulations. O’Reilly Media, Inc.
Henningsen, A. and Toomet, O. (2011). maxlik: A package for maximum
likelihood estimation in R. Computational Statistics, 26(3):443–458.
Kabacoff, R. I. (2015). R in Action: Data Analysis and Graphics with R.
Simon and Schuster, Shelter Island, New York.
Lander, J. P. (2014). R for Everyone: Advanced Analytics and Graphics.
Pearson Education.
Matloff, N. (2011). The Art of R Programming: A Tour of Statistical
170 /
Software Design. No Starch Press.
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

Mount, J. and Zumel, N. (2019). Practical Data Science with R. Simon


and Schuster, Shelter Island, New York.

Swain, J. J., Venkatraman, S., and Wilson, J. R. (1988). Least-squares


estimation of distribution functions in johnson’s translation system.
Journal of Statistical Computation and Simulation, 29(4):271–297.

Wickham, H. and Grolemund, G. (2016). R for Data Science: Import,


171 /
Tidy, Transform, Visualize, and Model data. O’Reilly Media, Inc.
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

171 /
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics
Overview
Parameter Estimation
Data Structures
Monte Carlo of Parameter Estimation
R Statistics
Linear Regression Models
R Graphics
Monte Carlo of Linear Regression Models
Inference

172 /
View publication stats
172 Dr. Ahmed Elshahhat R Programming Language for Data Analytics

You might also like