0% found this document useful (0 votes)

43 views13 pages

ShatterLine Blog

The document discusses regularization techniques for machine learning models like linear regression. It introduces regularization as a way to improve predictive accuracy and prevent overfitting. Specifically, it covers ridge regression, which shrinks coefficients equally towards zero, and lasso regression, which sets some coefficients exactly to zero, performing feature selection. It also discusses elastic net regression, which combines lasso and ridge penalties. The document provides an example applying these techniques to a dataset with many correlated predictor variables. Code snippets in R are included to fit models with different regularization penalties and tune hyperparameters.

Uploaded by

kumarahlad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views13 pages

ShatterLine Blog

Uploaded by

kumarahlad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ShatterLine Blog

Honing the Art of Machine Learning with R

09.12.13
Not-so-Naive Classification with the Naive Bayes Classifier
Posted in Bayes Reasoning at 8:59 pm by Auro Tripathy

A common (and successful) learning method is the Naive Bayes classifier. When supplied with a moderate-to-large
training set to learn from, the Naive Bayes Classifier does a good job of filtering out less relevant attributes and make
good classification decisions. In this article, I introduce the basics of a Naive Bayes classifier, provide an often-cited
example, and provide working R code.

Introduction to Naive Bayes Classifiers

The Naive Bayes classifier is based on Bayes’ theorem with the independence assumptions between features.

The Bayes’ rule (above) plays a central role in the probabilistic reasoning since it helps us ‘invert’ probabilistic
relationships between P(Class | x ) and P(x | Class).

So what’s naive about Naive Bayes?

It naively assumes that the attributes of any instance of the training-set are conditionally independent of each other (in
our example below, the cool temperatures are completely independent of the sunny outlook). We represent this
independence as:

P(x1, x2 …, xk | Classj) = ∏i P(xi, | Classj), or

P(x1, x2 …, xk | Classj) = P(x1 | Classj) × P(x2 | Classj) ×…× P(xk | Classj)

In plain English, if each feature (predictor) x is independent of every other feature, then the probability a data-
point (x1, x2 …, xk) is in Classj is simply the product of all the individual probabilities of feature xi in Classj.
Example

Let’s build a classifier that predicts whether I should play tennis given the forecast. It takes four attributes to describe
the forecast; namely, the outlook, the temperature, the humidity, and the presence or absence of wind. Furthermore the
values of the four attributes are qualitative (also known as categorical). They take on the values shown below.

Outlook ∈ [Sunny, Overcast, Rainy]

Temperature ∈ [Hot, Mild, Cool]
Humidity ∈ [High, Normal]
Windy ∈ [Weak, Strong]

The class label is the variable, Play and takes the values yes or no.

Play∈ [Yes, No]

We read-in training data below that has been collected over 14 days.

The Learning Phase

In the learning phase, we compute the table of likelihoods (probabilities) from the training data. They are:

P(Outlook=o|ClassPlay=b), where o ∈ [Sunny, Overcast, Rainy] and b ∈ [yes, no]

P(Temperature=t|ClassPlay=b), where t ∈ [Hot, Mild, Cool] and b ∈ [yes, no],

P(Humidity=h|ClassPlay=b), where h∈ [High, Norma] and b ∈ [yes, no],

P(Wind=w|ClassPlay=b), where w ∈ [Weak, Strong] and b ∈ [yes, no].

We also calculate P(ClassPlay=Yes) and P(ClassPlay=No).

Classification Phase

Let’s say, we get a new instance of the weather condition, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong) that will have to be classified (i.e., are we going to play tennis under the conditions specified by x’).

With the MAP rule, we compute the posterior probabilities. This is easily done by looking up the tables we built in the
learning phase.

P(ClassPlay=Yes|x’) = [P(Sunny|ClassPlay=Yes) × P(Cool|ClassPlay=Yes) ×

P(High|ClassPlay=Yes) × P(Strong|ClassPlay=Yes)] ×

P(ClassPlay=Yes)

= 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053

P(ClassPlay=No|x’) = [P(Sunny|ClassPlay=No) ×P(Cool|ClassPlay=No) ×

P(High|ClassPlay=No) × P(Strong|ClassPlay=No)] ×

P(ClassPlay=No)
= 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.0205

Since P(ClassPlay=Yes|x’) less than P(ClassPlay=No|x’), we classify the new instance x’ to be “No”.

The R Code

The R code works with the example dataset above and shows you a programmatic way to invoke the Naive Bayes
classifier in R.

rm(list=ls())
tennis.anyone <- read.table("http://www.shatterline.com/MachineLearning/data/tennis_anyone.csv", he
library(e1071) #naive Bayes classifier library
classifier<-naiveBayes(tennis.anyone[,1:4], tennis.anyone[,5])
table(predict(classifier, tennis.anyone[,-5]), tennis.anyone[,5], dnn=list('predicted','actual'))
classifier$tables
#new data #15
tennis.anyone[15,-5] <- as.factor(c(Outlook = "Sunny", Temperature = "Cool", Humidity = "High", Win

print(tennis.anyone[15,-5] )

result <- predict(classifier, tennis.anyone[15,-5] )

print(result)

Created by Pretty R at inside-R.org

Things t0 watch-out for – data underflow during multiplications

Calculating the product below may cause underflows.

P(x1 | Classj) × P(x2 | Classj) ×…× P(xk | Classj) × P(Classj).

References

Bayesian Reasoning and Machine Learning, by David Barber

http://www.csc.kth.se/utbildning/kth/kurser/DD2431/mi07/07_lecture07_6.pdf

http://www.cs.nyu.edu/faculty/davise/ai/bayesText.html

http://www.saedsayad.com/naive_bayesian.htm

Permalink Comments

07.14.13
Regularization – Predictive Modeling Beyond Ordinary Least Squares Fit
Posted in Linear Regression at 6:07 pm by Auro Tripathy

Introduction to Linear Functions and Regularization

A simple yet powerful prediction model assumes that the function is linear in the input even in cases where the input
consists of hundreds of variables and the input variables far outstrip the number of observations. Such prediction
models, known are linear regression/classification models, can often outperform fancier non-linear models.

The most popular method of estimation of the parameters (used interchangeably with the word, coefficients) is the
method of Ordinary Least Squares (OLS). The linear model can be written as
f(x) = βo + ∑Xj βj

where j=1 to p, X is the input vector, and βjs are the unknown coefficients.

We solve for the coefficients β (βo,β1,…βp) that minimize the residual sum of squares (RSS).

RSS() = ∑(yi – βo - ∑Xijβj)2

where i=1 to N observations, j= 1 to p variables

Reasons why OLS estimation is often unsatisfactory are:

1. Large variance in prediction accuracy. A solution to improving the overall accuracy is to shrink (or set to
zero) some of the coefficients. The overall effect is to prevent or reduce over-fitting.
2. With a large number of input predictors, one would like to determine a smaller subset that would exhibit the
strongest effects so we see the big picture.

The process of regularization involves a family of penalty terms that can be added to OLS to achieve the shrinkage (in
the coefficients).

The Ridge penalty term shrinks the regression coefficients by introducing the complexity parameter, λ, the greater the
value of λ, the greater the amount of shrinkage. By varying λ, the coefficients are shrunk towards zero (and to each
other).

λ ∑ βj2 where j=1 to p

While the Ridge penalty does a proportional shrinkage, the LASSO penalty λ, translates each coefficients by a constant
factor stopping at zero. LASSO also does feature-selection; if many features are correlated, LASSO will just pick one.

λ ∑ |βj|, where j=1 to p

Elastic Net penalty is a combination of the LASSO and Ridge regression penalty.

λ∑( α|βj| + (1 – α)βj2 ), where j=1 to p

The first term encourages a sparse solution in the coefficients and the second term encourages highly correlated features
to be averaged. The parameter α determines the mix of penalties and lies in the range of 0 and 1. With α set to 0, we get
the Ridge penalty and with α set to 1, we get the LASSO penalty.

Example

We now demonstrate this with a example dataset with 204 binary attributes and 704 observations.

Getting the Data

The R snippet below will download the dataset from where it is hosted. The data has been previously saved as an R
object in the .rda format. We reload it back in to the R object, hiv.data.

download.file("http://www.shatterline.com/MachineLearning/data/hiv.rda","hiv.rda", mode="wb")

load("hiv.rda", verbose=TRUE) #contains hiv.train & hiv.test

Visualizing the Data

The image function in R helps us visualize the dataset. You can see below the relatively strong correlation between the
variables. See the visualize.matrix function below.
Fitting/Plotting Data

The code snippet below shows the coefficient shrinkage is proportional to λ when we apply the Ridge penalty.
fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0) #Ridge penalty

The code snippet below shows that, with the LASSO penalty, the coefficient hit zero (unlike Ridge) as λ shrinks.
fit <- glmnet(hiv.train$x,hiv.train$y, alpha=1) #Lasso penalty
The code snippet below shows a mix of the Ridge and LASSO penalties with the Elastic Net penalty for a specific value
of α=0.2 (could also be chose with cross-validation).
fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0.2) #Elastic Net penalty

Cross-Validation

Ten-fold cross-validation shows us that the number of active variables are approximately 30.

cv.fit <- cv.glmnet(hiv.train$x,hiv.train$y) #10-fold cross-validation

plot(cv.fit)
legend("topleft",legend=c("10-fold Cross Validation"))
Predicting the Test Data with the
Model

The code snippet below predicts the error at every value of λ.
pred.y <- predict(fit, hiv.test$x) #predict the test data
mean.test.error <- apply((pred.y - hiv.test$y)^2,2,mean)
points(log(fit$lambda), mean.test.error, col="blue",pch="*")
legend("topleft",legend=c("10-fold Cross Validation","Test HIV Data"),pch="*",col=c("red","blue"))

Plotting the Regularization Path

The code snippet below shows the regularization path by plotting the coefficients against (log of) λ. Each curve
represents a coefficient in the model. As λ gets smaller, more coefficients enter the model from a zero value. (see to the
left).
plot(fit,xvar="lambda")
Code
# Author Auro Tripathy, [email protected]
# Adapted from ...Trevor Hastie's talk
rm(list=ls())

visualize.matrix <- function(mat) {

print(names(mat))

image(1:nrow(mat$x), 1:ncol(mat$x), z=mat$x,

col = c("darkgreen", "white"),
xlab = "Observations", ylab = "Attributes")

title(main = "Visualizing the Sparse Binary Matrix",

font.main = 4)
return (dim(mat$x)) #returns the dimensions of the matrix
}

#---main---#
library(glmnet)
?glmnet
download.file("http://www.shatterline.com/MachineLearning/data/hiv.rda",
"hiv.rda", mode="wb")
load("hiv.rda",
verbose=TRUE) #contains hiv.train & hiv.test
visualize.matrix(hiv.train)
visualize.matrix(hiv.test)

print(length(hiv.train$y)) #length of response variable

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0) #Ridge penalty

plot(fit)
legend("bottomleft",legend=c("Ridge Penalty, alpha=0"))

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=1) #Lasso penalty

plot(fit)
legend("bottomleft",legend=c("LASSO Penalty, alpha=1"))

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0.2) #ElasticNet penalty

plot(fit)
legend("bottomleft",legend=c("Elastic Net, alpha=0.2"))

cv.fit <- cv.glmnet(hiv.train$x,hiv.train$y) #10-fold cross-validation

plot(cv.fit)
legend("topleft",legend=c("10-fold Cross Validation"))
pred.y <- predict(fit, hiv.test$x) #predict the test data
mean.test.error <- apply((pred.y - hiv.test$y)^2,2,mean)
points(log(fit$lambda), mean.test.error, col="blue",pch="*")
legend("topleft",legend=c("10-fold Cross Validation","Test HIV Data"), pch="*", col=c("red","blue"))
plot(fit,xvar="lambda")
plot(fit,xvar="dev")

References
1. Prof Trever Hastie’s talk
2. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie , Robert Tibshirani , Jerome Friedman

Permalink Comments

06.23.13
Heart-Disease Predictor Using Logistic Regression
Posted in Linear Regression at 10:48 am by Auro Tripathy

Probability is the very guide of life.

- Cicero

Given a two-column dataset, column one being age and column two being the presence/absence of heart-disease, we
build a model (in R) that predicts the probability of heart-disease at an age. For a realistic model we aught to have big
datasets with additional predictor variables such as blood-pressure, cholesterol, diabetes, smoking etc. However, the
one-and-only predictor variable we have is age and the sample-size is 100 subjects!

Plotting the data (see below) doesn’t really provide a clear picture of the nature of the relationship between heart-disease
and age. The problem is that the response variable (presence/absence of heart disease) is binary.

Let’s create intervals of the independent variable (age) and compute the frequency of occurrence of the response
variable (presence/absence of heart disease). You can get the table below here.

A short and lucid tutorial in logistic regression is here (text) and here (video). The logistic curve is an S-shaped curve
that takes the form,
y = [exp(b0 + b1x)] / [1 + exp(b0 + b1x)]

Clearly, the curve is non-linear, but the logit-transform makes it linear.

logit(y) = b0 + b1x

Thus, logistic regression is linear regression on the logit transform of y, where y is the probability of success at each
value of x. Logistic regression fits b0 and b1, the regression coefficients.

The glm package in R is used to fit generalized regression models and can be used for logistic regression by specifying
the family parameter to be binomial with the logit link like so:

> glm.out = glm(cbind(chd.present, chd.absent) ~ age.mean,

+ family=binomial(logit), data=frequency.coronary.data)

Plotting the fit shows us the close relationship between the fitted values and the observed values.

Below is the R code that generated the plots.

rm(list=ls())
coronary.data <- read.table("http://www.shatterline.com/MachineLearning/data/AGE-CHD-Y-N.txt",
header=TRUE)
plot(CHD ~ Age, data=coronary.data, col="red")
title(main="Scatterplot of presence/absence of \ncoronary heart disease by age \nfor 100 subjects")

library(calibrate) #needed to label observation

frequency.coronary.data <- read.table("http://www.shatterline.com/MachineLearning/data/frequency-ta
header=TRUE)
frequency.coronary.data[,"age.mean"] <- (frequency.coronary.data$age.start +
frequency.coronary.data$age.end)/2
frequency.coronary.data <- frequency.coronary.data[, c(1,2,6,3,4,5)] #reorder cols
#With "family=" set to "binomial" with a "logit" link,
# glm( ) produces a logistic regression
glm.fit = glm(cbind(chd.present, chd.absent) ~ age.mean,
family=binomial(logit), data=frequency.coronary.data)

summary(glm.fit)
plot(chd.present/age.group.total ~ age.mean, data=frequency.coronary.data)
lines(frequency.coronary.data$age.mean, glm.fit$fitted, type="l", col="red")

textxy(frequency.coronary.data$age.mean,
frequency.coronary.data$chd.present/frequency.coronary.data$age.group.total,
frequency.coronary.data$age.mean, cx=0.6)
title(main="Percentage of subjects with heart disease in each age group")

Created by Pretty R at inside-R.org

References

1. http://www.youtube.com/watch?v=qSTHZvN8hzs&list=WL980F0C0E5B4CD53D#t=24m03s
2. http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html
3. Applied Logistic Regression, David W. Hosmer, Jr., Stanley Lemeshow, Rodney X. Sturdivant

Permalink Comments

06.17.13
My First Brush w/Open Data – Hospital Charges
Posted in Data Visualization at 4:35 am by Auro Tripathy

Curious about what a medical procedure may cost you? Then, read on…

Recent data on the top 100 medical procedures is available here. The Government will soon release data on yet another
30 procedures. Below is a box plot showing the bewildering variation in in-patient cost for medical procedures
nationwide.

The R code below can be executed, without changes, to generate the plot above.

You can also use openrefine to discover that a medical procedure with code 207 can cost up to a million dollars!

# Author Auro Tripathy, [email protected]

# The box plot code, written in R is reproducible and is licensed under Creative Commons, Attributi

# Medicare Provider Charge Data: Inpatient

# The data provided here include hospital-specific charges for the more than 3,000 U.S. hospitals
# that receive Medicare Inpatient Prospective Payment System (IPPS) payments for the top 100 most
# frequently billed discharges, paid under Medicare based on a rate per discharge using the
# Medicare Severity Diagnosis Related Group (MS-DRG) for Fiscal Year (FY) 2011. These DRGs
# represent almost 7 million discharges or 60 percent of total Medicare IPPS discharges.

# Read further and get the data from the link below
# http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Pr

rm(list=ls())
temp.zipped <- tempfile()
download.file("http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Report
temp.zipped)
hospital.charges <- read.csv(unz(temp.zipped, "Medicare_Provider_Charge_Inpatient_DRG100_FY2011.csv
unlink(temp.zipped)
dim(hospital.charges)

#min/max needed to bound the plot

max <- max(hospital.charges$Average.Covered.Charges)
min <- min(hospital.charges$Average.Covered.Charges)

#if you want to study the data further, use openrefine (aka Google Refine)
colnames(hospital.charges)
unique(hospital.charges$DRG.Definition)
unique(hospital.charges$Provider.Zip.Code)
unique(hospital.charges$Provider.Name)
unique(hospital.charges$Provider.City)

procedures <- unique(hospital.charges$DRG.Definition)

#procedure.by.charges.table
procedure.charges.array <- array(list(NULL), c(100))

for (i in 1:length(procedures)) {

procedure.charges <- hospital.charges[which(hospital.charges$DRG.Definition == procedures[i]), ]

print(nrow(procedure.charges))

#used in the box-and-whiskers plot below

procedure.charges.array[[i]] <- array(procedure.charges$Average.Covered.Charges, dim=nrow(procedu

#retain the three-digit medical code

procedures.labels <- as.character(procedures)
for (i in 1:length(procedures.labels)) {
procedures.labels[i] <- substr(procedures.labels[i], 1, 3)
}

#boxplot to show the media, the quartiles, and the outliers

boxplot(x = procedure.charges.array, main="Boxplot Showing Variation in In-Patient Cost for Medical
xlab="Medical Procedure Code",
col = c("lightgreen", "brown2", "cyan4"),
ylim=c(min, max), yaxt="n", col.ticks = "red", col.axis = "azure4", names=procedures.labels

axis(2, axTicks(2), labels=sprintf("$%2d", axTicks(2)), las=1)

Created by Pretty R at inside-R.org

Permalink Comments

Archives
September 2013
July 2013
June 2013

Categories
Bayes Reasoning
Data Visualization
Linear Regression

Search
Go

Design by Beccary · Sponsored by Weblogs.us · XHTML · CSS

Basic Micro Economics Module 2020 2021
No ratings yet
Basic Micro Economics Module 2020 2021
111 pages
NSC Maths Grade 12 November 2024 P2 and Memo
No ratings yet
NSC Maths Grade 12 November 2024 P2 and Memo
63 pages
The Official Shea Davis - Ahs Resume This Is The One Your Looking For
No ratings yet
The Official Shea Davis - Ahs Resume This Is The One Your Looking For
2 pages
Cognitive-Distortions Charm Dalupang 1MT-L USELF PDF
100% (1)
Cognitive-Distortions Charm Dalupang 1MT-L USELF PDF
2 pages
Me 232
No ratings yet
Me 232
185 pages
Workbook On Math Math Grade 6
100% (1)
Workbook On Math Math Grade 6
198 pages
CE CE131P 2 SyllabusModular
No ratings yet
CE CE131P 2 SyllabusModular
14 pages
The Heat Capacity of Ilmenite and Phase
No ratings yet
The Heat Capacity of Ilmenite and Phase
14 pages
Flowchart: A Pictorial Form of An Algorithm Is Known As A Flowchart
No ratings yet
Flowchart: A Pictorial Form of An Algorithm Is Known As A Flowchart
28 pages
Lesson 1.2. Intellectual Revolutions That Defined Society
No ratings yet
Lesson 1.2. Intellectual Revolutions That Defined Society
13 pages
Best Practice in Nickel Laterite Exploration and Resource Evaluation
83% (6)
Best Practice in Nickel Laterite Exploration and Resource Evaluation
68 pages
Mtec 115 - Workshop Theory and Practice III B Second Semester SY 2019 - 2020 Course Completion Hacksaw
No ratings yet
Mtec 115 - Workshop Theory and Practice III B Second Semester SY 2019 - 2020 Course Completion Hacksaw
10 pages
Ethics - Module
No ratings yet
Ethics - Module
3 pages
Practice 14 Practice Tests Set 14 - Paper 1H Mark Scheme
No ratings yet
Practice 14 Practice Tests Set 14 - Paper 1H Mark Scheme
15 pages
Ballast
No ratings yet
Ballast
15 pages
Topological Indices of Molecular Graph and Drug Design
No ratings yet
Topological Indices of Molecular Graph and Drug Design
5 pages
Grade 6 Investigation Marking Guideline
No ratings yet
Grade 6 Investigation Marking Guideline
5 pages
Sample Exam Questions With Answers
No ratings yet
Sample Exam Questions With Answers
5 pages
T G Butaney - Forecasting Prices (1958) PDF
100% (3)
T G Butaney - Forecasting Prices (1958) PDF
280 pages
4 MGP 2024 Cohort 7 General Studies September 24, 0023-09-30 AM
No ratings yet
4 MGP 2024 Cohort 7 General Studies September 24, 0023-09-30 AM
54 pages
GL Marine Cargo Logistics Trends Report 2023
No ratings yet
GL Marine Cargo Logistics Trends Report 2023
19 pages
ENVE 201 Tanıtım
No ratings yet
ENVE 201 Tanıtım
7 pages
Ahmed Mahmoud
No ratings yet
Ahmed Mahmoud
1 page
Term Paper Numerical Analysis Final
No ratings yet
Term Paper Numerical Analysis Final
15 pages
Tutorial 7
No ratings yet
Tutorial 7
2 pages
Project-Based Learning and Assessment: Dr. Teresa Yohon
No ratings yet
Project-Based Learning and Assessment: Dr. Teresa Yohon
17 pages
Imran CV
No ratings yet
Imran CV
2 pages
Analysis of Motor Units With High-Density Surface Electromyography
No ratings yet
Analysis of Motor Units With High-Density Surface Electromyography
12 pages
Mahabharat Reflection in Management
No ratings yet
Mahabharat Reflection in Management
3 pages
Gabriel Taborin College of Davao Foundation, Inc
No ratings yet
Gabriel Taborin College of Davao Foundation, Inc
17 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)

ShatterLine Blog

Uploaded by

ShatterLine Blog

Uploaded by

ShatterLine Blog

Honing the Art of Machine Learning with R

Introduction to Naive Bayes Classifiers

So what’s naive about Naive Bayes?

P(x1, x2 …, xk | Classj) = ∏i P(xi, | Classj), or

Outlook ∈ [Sunny, Overcast, Rainy]

Play∈ [Yes, No]

The Learning Phase

P(Outlook=o|ClassPlay=b), where o ∈ [Sunny, Overcast, Rainy] and b ∈ [yes, no]

P(Temperature=t|ClassPlay=b), where t ∈ [Hot, Mild, Cool] and b ∈ [yes, no],

P(Humidity=h|ClassPlay=b), where h∈ [High, Norma] and b ∈ [yes, no],

P(Wind=w|ClassPlay=b), where w ∈ [Weak, Strong] and b ∈ [yes, no].

P(ClassPlay=Yes|x’) = [P(Sunny|ClassPlay=Yes) × P(Cool|ClassPlay=Yes) ×

= 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053

P(ClassPlay=No|x’) = [P(Sunny|ClassPlay=No) ×P(Cool|ClassPlay=No) ×

result <- predict(classifier, tennis.anyone[15,-5] )

Created by Pretty R at inside-R.org

Things t0 watch-out for – data underflow during multiplications

Calculating the product below may cause underflows.

P(x1 | Classj) × P(x2 | Classj) ×…× P(xk | Classj) × P(Classj).

Bayesian Reasoning and Machine Learning, by David Barber

Introduction to Linear Functions and Regularization

where j=1 to p, X is the input vector, and βjs are the unknown coefficients.

RSS() = ∑(yi – βo - ∑Xijβj)2

where i=1 to N observations, j= 1 to p variables

Reasons why OLS estimation is often unsatisfactory are:

λ ∑ βj2 where j=1 to p

λ ∑ |βj|, where j=1 to p

Elastic Net penalty is a combination of the LASSO and Ridge regression penalty.

λ∑( α|βj| + (1 – α)βj2 ), where j=1 to p

Getting the Data

load("hiv.rda", verbose=TRUE) #contains hiv.train & hiv.test

Visualizing the Data

cv.fit <- cv.glmnet(hiv.train$x,hiv.train$y) #10-fold cross-validation

Plotting the Regularization Path

visualize.matrix <- function(mat) {

image(1:nrow(mat$x), 1:ncol(mat$x), z=mat$x,

title(main = "Visualizing the Sparse Binary Matrix",

print(length(hiv.train$y)) #length of response variable

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0) #Ridge penalty

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=1) #Lasso penalty

fit <- glmnet(hiv.train$x,hiv.train$y, alpha=0.2) #ElasticNet penalty

cv.fit <- cv.glmnet(hiv.train$x,hiv.train$y) #10-fold cross-validation

Probability is the very guide of life.

Clearly, the curve is non-linear, but the logit-transform makes it linear.

> glm.out = glm(cbind(chd.present, chd.absent) ~ age.mean,

Below is the R code that generated the plots.

library(calibrate) #needed to label observation

Created by Pretty R at inside-R.org

# Author Auro Tripathy, [email protected]

# Medicare Provider Charge Data: Inpatient

#min/max needed to bound the plot

procedures <- unique(hospital.charges$DRG.Definition)

procedure.charges <- hospital.charges[which(hospital.charges$DRG.Definition == procedures[i]), ]

#used in the box-and-whiskers plot below

#retain the three-digit medical code

#boxplot to show the media, the quartiles, and the outliers

axis(2, axTicks(2), labels=sprintf("$%2d", axTicks(2)), las=1)

Created by Pretty R at inside-R.org

Design by Beccary · Sponsored by Weblogs.us · XHTML · CSS

You might also like