0% found this document useful (0 votes)

12 views40 pages

Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024

The document discusses Monte Carlo simulations and their applications in statistical computing with R, focusing on randomness, random deviates, and resampling techniques. It highlights the importance of these methods in machine learning, particularly for addressing class imbalance and estimating model accuracy. Additionally, it covers the replication of simulations and summarization of results to derive insights from data.

Uploaded by

pra Bee In adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views40 pages

Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024

Uploaded by

pra Bee In adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Statistical Computing with R:

Masters in Data Sciences 503 (S29)

Third Batch, SMS, TU, 2024
Shital Bhandary
Associate Professor
Statistics/Bio-statistics, Demography and Public Health Informatics
Patan Academy of Health Sciences, Lalitpur, Nepal
Faculty, Data Analysis and Decision Modeling, MBA, Pokhara University, Nepal
Faculty, FAIMER Fellowship in Health Professions Education, India/USA.
Review Preview:
• Monte Carlo simulations • Class imbalance problem
• Randomness • Statistical approach
• Random deviates • Data science approach
• Resampling
• Use of Monte Carlo • Missing data
methods in Machine • Supervised learning
Learning • Unsupervised learning
Monte Carlo Simulations:
https://bstaton1.github.io/au-r-workshop/ch4.html
• Simulation modeling is one of • We can then summarize and plot
the primary reasons to move the results of these replicated
away from spreadsheet-type calculations all within the same
programs (like Microsoft Excel) program.
and into a program like R. • Analyses of this type are
called Monte Carlo methods:
• R allows us to replicate the same they randomly sample from a set
(possibly complex and detailed) of quantities for the purpose of
calculations over and over with generating and summarizing a
different random values. distribution of some statistic
related to the sampled
quantities.
Randomness:
• A critical part of simulation • They are tightly linked to the
modeling is the use of random concept of uncertainty: you are
processes. unsure about the outcome the
next time the process is executed.
• A random process is one that
generates a different outcome • There are two basic ways to
according to some rules each introduce randomness in R:
time it is executed. • Random deviates
• Resampling
Random deviates:
• At the end of each year, each • We can execute a binomial random
individual alive at the start can process with p=0.8 and n=100 like
either live or die. this in R:
• There are two outcomes here, and
suppose each individual has an • rbinom(n = 1, size = 100, prob =
80% chance of surviving. 0.8)

• The number of individuals that • I got:

survive is the result of a binomial
random process in which there • [1] 83
were n individuals alive at the start
of this year and p is the probability
that any one individual survives to • But you almost certainly get
the next year. different number than this one!
We can also plot it with a bit of tweaking:
# Histogram

survivors = rbinom(1000, 100, 0.8)

hist(survivors, col = "skyblue")
We could also use other processes like log
normal:
• Another random process is
the lognormal process.
• It generates random numbers
such that the log of the values
are normally-distributed with
mean equal to logmean and
standard deviation equal
to logsd
• hist(rlnorm(1000, 0, 0.1), col =
"skyblue")
Need for sampling:
https://machinelearningmastery.com/monte-carlo-sampling-for-probability/

• There are many problems in • The desired calculation is typically

probability, and more broadly in a sum of a discrete distribution or
machine learning, where we integral of a continuous
cannot calculate an analytical distribution and is intractable to
solution directly. calculate.

• Class imbalance problem is such • The calculation may be intractable

situation in Machine Learning! for many reasons, such as the large
number of random variables, the
stochastic nature of the domain,
• In fact, there may be an argument noise in the observations, the lack
that exact inference may be of observations, and more.
intractable for most practical
probabilistic models.
Resampling:
• Using random deviates works great #Resampling of 1 to 10:
for creating new random numbers, • sample(x = 1:10, size = 5)
but what if we already have a set of
numbers that we wish to introduce
randomness to? #Sample with replacement
• sample(x = c("a", "b", "c"), size =
• For this, we can use resampling 10, replace = T)
techniques.
#Sample with set probabilities
• In R, the sample() function is used • sample(x = c("live", "die"), size =
to sample size elements from the 10, replace = T, prob = c(0.8, 0.2))
vector x.
We have used it:
• roll() function defining roll of a fair die twice
• Training and Testing sets definition, cross-validation
Reproducing randomness:
• For reproducibility purposes, we #Example:
may wish to get the same exact • set.seed(1234)
random numbers each time we
run our script. • rnorm(1)
• [1] -1.207066
• To do this, we need to set
the random seed, which is the #Try without random seed
starting point of the random • rnorm(1)
number generator our computer
uses. • [1] 0.2774292
Replication:
• To use Monte Carlo methods, we • The replicate() function executes
need to be able to replicate some expression many times
some random process many and returns the output from
times. each execution.

• There are two main ways this is • Say we have a vector x, which
commonly done: either with represents 30 observations of an
replicate() or with for() loops. animal length (mm):

• x = rnorm(30, 500, 30)

Replication in R:
• We wish to build the sampling #Code after x is defined:
distribution of the mean length
“by hand”.
means = replicate(n = 1000, expr = {
x_i = sample(x, length(x),
• We can sample randomly from replace = T)
it, calculate the mean, then mean(x_i)
repeat this process many times.
})

• This can be done in R with:

Mean and SE same in x and 1000 replicated
means of x? Unbiased estimate of x!
• If we take mean(means) and sd #Check means first
(means), that should be very • mean(means); mean(x)
similar to mean(x) and se(x).
• -[1] 492.5897
• [1] 492.6636
• Create the se() function and
prove this using R! #Standard error of mean
sd(means); se(x)
• se = function(x) [1] 5.130683
sd(x)/sqrt(length(x)) [1] 5.023584 More on Law of Large Numbers here:
https://machinelearningmastery.com
Monte Carlo Simulations is based on Law of Large Numbers. It can also be /a-gentle-introduction-to-the-law-of-
used to prove Regression to Mean and Central Limit Theorem. large-numbers-in-machine-learning/
Replication with “for” loop:
• In programming, a loop is a • A for() loop repeats some action
command that does something for however many times you tell
over and over until it reaches some it for each value in some vector.
point that you specify.
#For loop syntax:
•R has a few types of
loops: repeat(), while(), and for(),
to name a few. for (var in seq) {
expression(var)
• for() loops are among the most }
common in simulation modeling.
Examples:
#1 #Output 1
for (i in 1:5) { • [1] 1
print(i^2) • [1] 4
} • [1] 9
#2 • [1] 16
results=numeric(5) • [1] 25
for (i in 1:5) {
results[i] = i^2 } #Output 2
results [1] 1 4 9 16 25
More:
• nt = 100 # number of years #Loop for replication

• N = NULL # container for (fish) for (t in 2:nt) {

abundance
#N this year is N last year * growth *
• N[1] = 1000 # first end-of-year # randomness * fraction that survive
abundance harvest

N[t] = (N[t-1] * 1.1 * rlnorm(1, 0,

0.1)) * (1 - 0.08)
}
Let’s plot it:
• plot(N, type = "l", pch = 15, xlab
= "Year", ylab = "Abundance")
Function writing for Monte Carlo simulation:
#In Monte Carlo analyses, it is often useful pop_sim = function(nt, grow, sd_grow, U, plot = F) {
to wrap code into functions. N = NULL
• This allows for easy replication and setting
adjustment (e.g., if you wanted to N[1] = 1000
compare the growth trajectories of two for (t in 2:nt) {
populations with differing growth rates).
N[t] = (N[t-1] * grow * rlnorm(1, 0, sd_grow)) * (1 - U)
#Let’s use five parameters to do so now:
}
• nt: the number of years,
if (plot) { plot(N, type = "l", pch = 15, xlab = "Year",
• grow: the population growth rate, ylab = "Abundance")
• sd_grow: the amount of annual variability }
in the growth rate
N
• U: the annual exploitation rate
}
• plot: whether you wish to have a plot
created.
Run: pop_sim(100, 1.1, 0.1, 0.08, T) to get

• [1] 1000.0000 982.3888 802.9221 930.8944 942.8799 1147.2425 1343.0696

• [8] 1547.2829 1679.2181 1514.6867 1513.1179 1560.9256 1736.7056 2135.8081
• [15] 2106.6725 1775.4615 1665.7489 1623.7020 1589.0171 1889.1755 2029.1288
• [22] 2170.6199 2058.1873 2038.3532 2347.5983 2290.1806 2671.5877 2598.8134
• [29] 2738.5065 2669.0003 2617.4264 2859.6799 2764.8132 2694.8130 2388.6001
• [36] 2057.0187 2041.2244 2351.3923 2395.7745 2151.1563 2509.3455 2943.5983
• [43] 2599.5925 2706.5242 2710.5283 2587.0943 2696.7068 2573.2741 2267.4747
• [50] 2676.7501 2638.4771 2306.5914 2464.6563 2126.1586 2090.3945 2131.9059
• [57] 2676.4949 2435.6190 2128.2608 2225.5276 2179.7877 2706.6805 2989.4001
• [64] 3277.0129 3609.7139 3843.7520 4117.0917 4546.7481 4706.4806 5077.8774
• [71] 6248.7845 5797.8300 5824.2902 5400.6019 4948.3756 4747.5507 5046.0663
• [78] 5894.9432 6207.3198 6030.4074 6706.5260 6884.1739 6946.1890 7204.8305
• [85] 7895.0993 7563.0521 8655.1318 8783.0285 7210.3333 8300.1920 10254.6761
• [92] 9983.9319 10467.9362 9487.7283 9186.1128 10096.7386 8892.1724 11403.9986
• [99] 11699.4072 12772.6927
Replicating the simulation:
#Replicate the simulation for • out = large matrix (10000
1000 times elements, 800.2 kb)
• out = replicate(n = 1000, expr =
pop_sim(100, 1.1, 0.1, 0.08, F)) #View this matrix in R Studio:
• View(out)
Summarization of simulation:
• After replicating a calculation #Central Tendency: mean
many times, we will need to • N_mean = apply(out, 1, mean)
summarize the results.
• N_mean[1:10]

• We must show the central

tendency and variability #Variability:
N_sd = apply(out, 1, sd)
• We can also show Frequencies N_sd[1:10]
and cross-tabulations
Summarization of simulation:
#Frequencies 1 #Cross-tabulations
out10 = ifelse(out[10,] < 1000, • table(out10, out20)
"less10", "greater10")
table(out10) #Cross-tabulations with
probabilities
#Frequencies 2 • round(table(out10,
out20 = ifelse(out[20,] < 1100, out20)/1000, 2)
"less20", "greater20")
table(out20)
Simulation Based Learning: Example 1
• mu = 500; sig = 30

• random = rnorm(100, mu, sig)

• p = seq(0.01, 0.99, 0.01)

random_q = quantile(random, p)
normal_q = qnorm(p, mu, sig)
• plot(normal_q ~ random_q);
abline(c(0,1))
Simulation Based Learning: Example 2
• q = seq(400, 600, 10)
• random_cdf = ecdf(random)
• random_p = random_cdf(q)
• normal_p = pnorm(q, mu, sig)

• plot(normal_p ~ q, type = "l", col

= "blue") points(random_p ~ q,
col = "red")
Use in Machine learning:
https://machinelearningmastery.com/monte-carlo-sampling-for-probability/

• In machine learning, Monte • Random sampling of model

Carlo methods provide the basis hyperparameters when tuning a
for resampling techniques like model is a Monte Carlo method
the bootstrap method for
estimating a quantity, such as
the accuracy of a model on a • Ensemble models used to
limited dataset. overcome challenges such as the
limited size and noise in a small
• We have seen its use in: data sample and the stochastic
• Resampling algorithms variance in a learning algorithm
• Random hyperparameter tuning are all examples of Monte Carlo
(caret package) methods.
• Ensemble learning algorithms
Question/queries so far?
Class imbalance problem: Binary dep. var. (y)
• It happens in the classification • Statistical approach
problems • Instead of binary logistic
• When we have a categorical regression
• Use exact logistic regression
binary dependent variable then • Use Poisson regression
distribution of 1 and 0 may not • Use zero-inflated Poisson regression
be equal (or very skewed) • Use negative binomial regression
• When it is very skewed then it is • Data science approach
known as “class imbalance” • Generate new data using
• We can deal with it using simulations, make the balanced
class and get accuracy measures
statistics or data science
Class imbalance problem: Categorical “y”
• It happens in the classification • Statistical approach
problems • Instead of multinominal or ordinal
• When we have a categorical logistic regression
• Use exact multinomial/ordinal
dependent variable then logistic regression
distribution of 0, 1 or 2 may not • Use Poisson regression
be equal (or very skewed) • Use zero-inflated Poisson regression
• Use negative binomial regression
• When it is very skewed then it is
known as “class imbalance” • Data science approach
• We can deal with it using • Generate new data using
simulations, make the balanced
statistics or data science class and get accuracy measures
In statistics, we are more concerned with “Simpson’s Paradox” than the Class Imbalance problems!
UCLA Admission “paradox”: Overall few females were admitted but more females were admitted when the
same data was analyzed by departments! Same can happen with all the supervised models!
Example: binary.csv data
#Admission to UCLA #Class imbalance problem data
• Four variables in the data data <- read.csv("binary.csv",
• admit = Admitted or not header = T)
• gre = GRE score str(data)
• gpa = GPA score summary(data)
• rank = Rank of the institute #Change the admit as factor
where they got their GPA variable
data$admit <-
as.factor(data$admit)
summary(data)
Outputs:
admit gre gpa rank admit gre gpa rank
• Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000 • 0:273 Min. :220.0 Min. :2.260 Min. :1.000
• 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000 • 1:127 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
• Median :0.0000 Median :580.0 Median :3.395 Median :2.000
Median :580.0 Median :3.395 Median :2.000
• Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
Mean :587.7 Mean :3.390 Mean :2.485
• 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
• Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
Max. :800.0 Max. :4.000 Max. :4.000

• prop. table(table(data$admit))
Class imbalance as dependent variable “admit” has 273
(68.25%) cases in 0 (not admitted category) and 127
• 0 1 (31.75%) in 1 (admitted) category.
• 0.6825 0.3175

In statistics, we deal it using different methods but in data

science we deal it with making these classes “balanced”
Let’s predict without correcting imbalance:
#Data partition • #Check the imbalance in the
#set.seed(1234) train data
• Ind <- sample(2, nrow(data), • table(train$admit)
replace=T, prob=c(0.7,0.3)) •0 1
• 196 83
• train <- data[ind==1,] • prop.table(table(train$admit))
• 0 1
• test <- data[ind==2,] • 0.702509 0.297491 (Is this
really imbalance!)
Let’s predict without correcting imbalance:
#Prediction model • #Outputs
#Random forest model Reference
library(randomForest) Prediction 0 1
rfm.train <- randomForest(admit~., 0 73 32
data=train) 1 4 12
#Model evaluation with test data Accuracy : 0.7025 (misleading!)
using caret package
95% CI : (0.6126, 0.7821)
library(caret)
Sensitivity : 0.27273 (not good for 1)
confusionMatrix(predict(rfm.train,
test), test$admit, positive = '1') Specificity : 0.94805 (good for 0)
This is due to “class imbalance” problem!
Let’s predict with correction: Oversampling
#Correcting class imbalance by • Here 196 is used as there was an
oversampling: Using Randomly imbalance in the train data
Oversampling Examples (ROSE) • table(train$admit)
package
•0 1
• library(ROSE)
• 196 83
#We will get equal values now:
• over.samp <- ovun.sample(admit~.,
data = train, method = "over", N = • table(over.samp$admit)
196*2)$data •0 1
• 196 196
• table(over.samp$admit) #Resampling of observed values of
category=1 is used to get more 1s!
Let’s predict with correction: Oversampling
#Check summary for changes in the Reference
other variables too! Prediction 0 1
• summary(over.samp) 0 59 22
1 18 22
#Random Forrest model with over Accuracy : 0.6694
sampled data
• rfm.os <- randomForest(admit~., 95% CI : (0.5781, 0.7522)
data=over.samp) Sensitivity : 0.5000
• confusionMatrix(predict(rfm.os, Specificity : 0.7662
test), test$admit, positive = '1')
Sensitivity improved (good if we
wanted to improve prediction for 1)
but overall accuracy decreased!
What else can be done with ROSE package?
• We can do the undersampling • We can create a synthetic data,
and check the model accuracy, fit the model, predict it to check
sensitivity and specificity again the model accuracy, sensitivity,
specificity etc.
• We can do both i.e.
oversampling and • While creating synthetic data,
undersamplign and check the we must use random seed too in
model accuracy, sensitivity and the function to get replicable
specificity again results!

More on Synthetic Minority Oversampling (SMOTE) here:

https://www.youtube.com/watch?v=dkXB8HH_4-k
Missing values:
https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e

• The real-world data often has a lot • The 7 ways to handle missing
of missing values. The cause of values in the dataset
missing values can be data • Deleting Rows with missing values
corruption or failure to record • Impute missing values for continuous
data. variable (mean, median etc.)
• The handling of missing data is • Impute missing values for categorical
variable (predict the categories)
very important during the • Other Imputation Methods
preprocessing of the dataset as
• Using Algorithms that support
many machine learning algorithms missing values
do not support missing values. • Prediction of missing values
• Visit the link to learn more about • Imputation using Deep Learning
handling missing values to learn: Library — Datawig
Missing values checking and handling in R:
#Check missing values in R #List of R Packages
• colsum(is.na(data frame)) • MICE
• sum(is.na(data frame$column name) • Amelia
#Strategies • missForest
• List-wise deletion • Hmisc
• Pair-wise deletion • mi
• Mean/ Mode/ Median Imputation • etc.
• Generalized Imputation
• Similar case Imputation
• Prediction Model
• KNN Imputation
https://medium.com/coinmonks/dealing-with-missing- https://www.analyticsvidhya.com/blog/2016/03/tutori
data-using-r-3ae428da2d17 al-powerful-packages-imputing-missing-values/
MICE package:
• MICE (Multivariate Imputation via • It imputes data on a variable by
Chained Equations) is one of the variable basis by specifying an
commonly used package by R imputation model per variable.
users. Creating multiple • The methods used by this package
imputations as compared to a are:
single imputation (such as mean) • PMM (Predictive Mean Matching) —
takes care of uncertainty in missing For numeric variables
values. • logreg(Logistic Regression) — For
• MICE assumes that the missing Binary Variables( with 2 levels)
data are Missing at Random • polyreg(Bayesian polytomous
(MAR), which means that the regression) — For Factor Variables (>=
probability that a value is missing 2 levels)
depends only on observed value • Proportional odds model (ordered
and can be predicted using them. and censored variables, >= 2 levels)

More here: https://medium.com/coinmonks/dealing- Use of MICE with an example data is here:

with-missing-data-using-r-3ae428da2d17 https://www.youtube.com/watch?v=An7nPLJ0fsg
Question/queries?
• Next classes • Communicating the results of
data science projects

• Defining projects in R studio

• Local file/folder
• GitHub repository

• R notebook
Thank you!
@shitalbhandary

Solutions Manual For Statistical Computing With R - Rizzo
100% (1)
Solutions Manual For Statistical Computing With R - Rizzo
136 pages
03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
100% (1)
03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
270 pages
Bio Stat Methods
No ratings yet
Bio Stat Methods
474 pages
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
No ratings yet
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
258 pages
Biological Data Analysis Using R
No ratings yet
Biological Data Analysis Using R
226 pages
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
No ratings yet
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
68 pages
The Finite Element Method
No ratings yet
The Finite Element Method
13 pages
PART II - Private Corporations
No ratings yet
PART II - Private Corporations
6 pages
Project Insight - Cipla LTD
No ratings yet
Project Insight - Cipla LTD
14 pages
Data Science Probability
No ratings yet
Data Science Probability
97 pages
Probst at Book
No ratings yet
Probst at Book
539 pages
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
No ratings yet
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
4 pages
Annual Report On CSR Activities 2021-22
No ratings yet
Annual Report On CSR Activities 2021-22
16 pages
Nursing in Research in Malawi
100% (1)
Nursing in Research in Malawi
28 pages
Prose Genres
100% (2)
Prose Genres
33 pages
Data Science - Probability
No ratings yet
Data Science - Probability
53 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Daily Express Friday April 29 2011
No ratings yet
Daily Express Friday April 29 2011
80 pages
Noam Chomsky - The 10 Media Manipulation Strategies
100% (2)
Noam Chomsky - The 10 Media Manipulation Strategies
4 pages
Introduction To Statistical Thought
100% (2)
Introduction To Statistical Thought
393 pages
Operator'S Manual: 110 Series Leveling System HWH Lever-Controlled
100% (1)
Operator'S Manual: 110 Series Leveling System HWH Lever-Controlled
15 pages
Wireless Television Notice Board
No ratings yet
Wireless Television Notice Board
10 pages
R03 Simulation.128
No ratings yet
R03 Simulation.128
18 pages
Probability and Stats For Data Science PDF
100% (1)
Probability and Stats For Data Science PDF
237 pages
STA1007S Lab 6: Custom Functions: "Sample"
No ratings yet
STA1007S Lab 6: Custom Functions: "Sample"
6 pages
UA SmartLife Brochure-English
No ratings yet
UA SmartLife Brochure-English
10 pages
Notes PDF
No ratings yet
Notes PDF
294 pages
Homework: Level 3 BTEC Applied Science Unit 1 Past Paper Exam Questions
No ratings yet
Homework: Level 3 BTEC Applied Science Unit 1 Past Paper Exam Questions
3 pages
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
No ratings yet
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
19 pages
Grading Criteria - Business Plan Presentation
No ratings yet
Grading Criteria - Business Plan Presentation
3 pages
LJN Intercity Second Sitting (2S)
No ratings yet
LJN Intercity Second Sitting (2S)
2 pages
Simulation: Hadley Wickham
No ratings yet
Simulation: Hadley Wickham
23 pages
STERN White Paper 2017-03 Withcover (1) - 0
No ratings yet
STERN White Paper 2017-03 Withcover (1) - 0
270 pages
Probability in Computer Science
100% (1)
Probability in Computer Science
353 pages
STAT 230 Course Notes Fall 2019
No ratings yet
STAT 230 Course Notes Fall 2019
425 pages
Final Year Report Presentation Edited
No ratings yet
Final Year Report Presentation Edited
52 pages
Data Science Probability
No ratings yet
Data Science Probability
75 pages
Stridhana A Critical Approach Research M
No ratings yet
Stridhana A Critical Approach Research M
13 pages
Simple Statistics Functions in R
No ratings yet
Simple Statistics Functions in R
41 pages
Sampling and Replication
No ratings yet
Sampling and Replication
16 pages
223 Lec Not RLang
No ratings yet
223 Lec Not RLang
28 pages
17-In Re Vicente Pelaez March 3, 1923
No ratings yet
17-In Re Vicente Pelaez March 3, 1923
3 pages
00 Lab Notes
No ratings yet
00 Lab Notes
10 pages
2024 NGN HESI EXIT RN Exam V1, V2, V3, V4, V5, V6, Each Exam With 160 Latest Questions and Answers Updated (Verified Revised Full Exam)
No ratings yet
2024 NGN HESI EXIT RN Exam V1, V2, V3, V4, V5, V6, Each Exam With 160 Latest Questions and Answers Updated (Verified Revised Full Exam)
462 pages
01 Guide To Drafting Your Critical Role Letters
No ratings yet
01 Guide To Drafting Your Critical Role Letters
3 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Value Oriented Education
No ratings yet
Value Oriented Education
10 pages
DSC4821 Full Study Guide
No ratings yet
DSC4821 Full Study Guide
109 pages
Units 15 16 - Exercises
No ratings yet
Units 15 16 - Exercises
4 pages
Physics Project Class 12 B Session-2023-24
No ratings yet
Physics Project Class 12 B Session-2023-24
7 pages
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
No ratings yet
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
33 pages
Brochure Antech Type C
No ratings yet
Brochure Antech Type C
2 pages
Probability and Statistics
No ratings yet
Probability and Statistics
580 pages
MTH210
No ratings yet
MTH210
126 pages
2021 Investment Case For After School Programmes
No ratings yet
2021 Investment Case For After School Programmes
27 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
Econ117 Notes l1 13 s2024
No ratings yet
Econ117 Notes l1 13 s2024
126 pages
Math 1280 Notes
No ratings yet
Math 1280 Notes
91 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Tutorial 7 - Questions
No ratings yet
Tutorial 7 - Questions
4 pages
Statistical Computing With R: Masters in Data Science 503 (S15) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Science 503 (S15) Third Batch, SMS, TU, 2024
40 pages
Sampling Chapter3
No ratings yet
Sampling Chapter3
29 pages
Sim R
No ratings yet
Sim R
6 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Statistical Computing With R: Masters in Data Sciences 503 (S28) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S28) Third Batch, SMS, TU, 2024
35 pages
Chapter 0 Introduction
No ratings yet
Chapter 0 Introduction
14 pages
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
No ratings yet
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
5 pages
Adobe Scan Oct 20, 2024
No ratings yet
Adobe Scan Oct 20, 2024
1 page
Reasoning - Sample - Class 3 - Answer
No ratings yet
Reasoning - Sample - Class 3 - Answer
5 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
ProbList1 24Sln
No ratings yet
ProbList1 24Sln
63 pages
Practical 5 2
No ratings yet
Practical 5 2
7 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Handbook Feb23
No ratings yet
Handbook Feb23
377 pages
Unit5 Randomsampling
No ratings yet
Unit5 Randomsampling
21 pages
Unit4 Clustering Algorithms
No ratings yet
Unit4 Clustering Algorithms
43 pages
Unit4 Clustering Evaluation
No ratings yet
Unit4 Clustering Evaluation
53 pages
Unit2 AssociationAnalysis V2
No ratings yet
Unit2 AssociationAnalysis V2
46 pages
Unit4 Clustering
No ratings yet
Unit4 Clustering
46 pages
Unit3 SVM
No ratings yet
Unit3 SVM
20 pages
Unit1 Introduction
No ratings yet
Unit1 Introduction
38 pages
Probability and Statistics With Examples Using R: Siva Athreya, Deepayan Sarkar, and Steve Tanner April 25, 2016
No ratings yet
Probability and Statistics With Examples Using R: Siva Athreya, Deepayan Sarkar, and Steve Tanner April 25, 2016
4 pages
Introduction To Six Sigma &amp Process Improvement 2nd An James R. Evans &amp William M. Lindsay Instant Download
100% (2)
Introduction To Six Sigma &amp Process Improvement 2nd An James R. Evans &amp William M. Lindsay Instant Download
28 pages
Unit3 KNN Examples
No ratings yet
Unit3 KNN Examples
7 pages
Unit4 HAC Example
No ratings yet
Unit4 HAC Example
7 pages
01 Decision Analysis Presentation Group D
No ratings yet
01 Decision Analysis Presentation Group D
78 pages
02 Decision Making Under Uncertainty and Risk
No ratings yet
02 Decision Making Under Uncertainty and Risk
12 pages
LectureNotes Complete
No ratings yet
LectureNotes Complete
90 pages
Stat 413
No ratings yet
Stat 413
55 pages
Prob Stat Book W21
No ratings yet
Prob Stat Book W21
379 pages
R11 Bootstrap
No ratings yet
R11 Bootstrap
13 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet

Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024

Uploaded by

Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024

Uploaded by

Statistical Computing with R:

Masters in Data Sciences 503 (S29)

• The number of individuals that • I got:

survivors = rbinom(1000, 100, 0.8)

• There are many problems in • The desired calculation is typically

• Class imbalance problem is such • The calculation may be intractable

• x = rnorm(30, 500, 30)

• This can be done in R with:

• N = NULL # container for (fish) for (t in 2:nt) {

N[t] = (N[t-1] * 1.1 * rlnorm(1, 0,

• [1] 1000.0000 982.3888 802.9221 930.8944 942.8799 1147.2425 1343.0696

• We must show the central

• random = rnorm(100, mu, sig)

• p = seq(0.01, 0.99, 0.01)

• plot(normal_p ~ q, type = "l", col

• In machine learning, Monte • Random sampling of model

In statistics, we deal it using different methods but in data

More on Synthetic Minority Oversampling (SMOTE) here:

More here: https://medium.com/coinmonks/dealing- Use of MICE with an example data is here:

• Defining projects in R studio

You might also like