0% found this document useful (0 votes)
68 views3 pages

CRAN Task View Machine Lea..

This document summarizes machine learning and statistical learning packages available on CRAN. It groups the packages by topic, including neural networks, recursive partitioning, random forests, regularized and shrinkage methods, boosting, support vector machines, Bayesian methods, optimization using genetic algorithms, association rules, and model selection and validation. Over 30 CRAN packages are listed that implement various machine learning and statistical learning algorithms.

Uploaded by

isabel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views3 pages

CRAN Task View Machine Lea..

This document summarizes machine learning and statistical learning packages available on CRAN. It groups the packages by topic, including neural networks, recursive partitioning, random forests, regularized and shrinkage methods, boosting, support vector machines, Bayesian methods, optimization using genetic algorithms, association rules, and model selection and validation. Over 30 CRAN packages are listed that implement various machine learning and statistical learning algorithms.

Uploaded by

isabel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CRAN Task View: Machine Learning & Statistical Learning http://cran.r-project.org/web/views/MachineLearning.

html

CRAN Task View: Machine Learning & Statistical Learning

Maintainer: Torsten Hothorn


Contact: Torsten.Hothorn at R-project.org
Version: 2010-12-10
Several add-on packages implement ideas and methods developed at the borderline between computer
science and statistics - this field of research is usually referred to as machine learning. The packages can be
roughly structured into the following topics:

eural etworks : Single-hidden-layer neural network are implemented in package nnet (shipped with
base R). Package RSNNS offers an interface to the Stuttgart Neural Network Simulator (SNNS).
Recursive Partitioning : Tree-structured models for regression, classification and survival analysis,
following the ideas in the CART book, are implemented in rpart (shipped with base R) and tree.
Package rpart is recommended for computing CART-like trees. A rich toolbox of partitioning
algorithms is available in Weka , package RWeka provides an interface to this implementation,
including the J4.8-variant of C4.5 and M5.
Two recursive partitioning algorithms with unbiased variable selection and statistical stopping criterion
are implemented in package party. Function ctree() is based on non-parametrical conditional
inference procedures for testing independence between response and each input variable whereas
mob() can be used to partition parametric models. Extensible tools for visualizing binary trees and node
distributions of the response are available in package party as well.
An adaptation of rpart for multivariate responses is available in package mvpart. A tree algorithm fitting
nearest neighbors in each node is implemented in package knnTree. For problems with binary input
variables the package LogicReg implements logic regression. Graphical tools for the visualization of
trees are available in packages maptree and pinktoe. An approach to deal with the instability problem
via extra splits is available in package TWIX.
Trees for modelling longitudinal data by means of random effects are offered by package REEMtree.
Random Forests : The reference implementation of the random forest algorithm for regression and
classification is available in package randomForest. Package ipred has bagging for regression,
classification and survival analysis as well as bundling, a combination of multiple models via ensemble
learning. In addition, a random forest variant for response variables measured at arbitrary scales based
on conditional inference trees is implemented in package party. randomSurvivalForest offers a random
forest algorithm for censored data. Quantile regression forests quantregForest allow to regress quantiles
of a numeric response on exploratory variables via a random forest approach. For binary data,
LogicForest is a forest of logic regression trees (package LogicReg. The varSelRF and Boruta packages
focus on variable selection by means for random forest algorithms.
Regularized and Shrinkage Methods : Regression models with some constraint on the parameter
estimates can be fitted with the lasso2 and lars packages. Lasso with simultaneous updates for groups of
parameters (groupwise lasso) is available in package grplasso. The L1 regularization path for
generalized linear models and Cox models can be obtained from functions available in package
glmpath, the entire lasso or elastic-net regularization path (also in elasticnet) for linear regression,
logistic and multinomial regression models can be obtained from package glmnet. The penalized
package provides an alternative implementation of lasso (L1) and ridge (L2) penalized regression
models (both GLM and Cox models). A generalisation of the Lasso shrinkage technique for linear
regression is called relaxed lasso and is available in package relaxo. The shrunken centroids classifier
and utilities for gene expression analyses are implemented in package pamr. An implementation of
multivariate adaptive regression splines is available in package earth. Variable selection through clone
selection in SVMs in penalized models (SCAD or L1 penalties) is implemented in package
penalizedSVM. Various forms of penalized discriminant analysis are implemented in packages hda, rda,
sda, and SDDA. Package LiblineaR offers an interface to the LIBLINEAR library.

1 de 3 13/12/2010 06:30 p.m.


CRAN Task View: Machine Learning & Statistical Learning http://cran.r-project.org/web/views/MachineLearning.html

Boosting : Various forms of gradient boosting are implemented in package gbm (tree-based functional
gradient descent boosting). Package GAMBoost can be used to fit generalized additive models by a
boosting algorithm. An extensible boosting framework for generalized linear, additive and
nonparametric models is available in package mboost. Likelihood-based boosting for Cox models is
implemented in CoxBoost
Support Vector Machines and Kernel Methods : The function svm() from e1071 offers an interface to
the LIBSVM library and package kernlab implements a flexible framework for kernel learning
(including SVMs, RVMs and other kernel learning algorithms). An interface to the SVMlight
implementation (only for one-against-all classification) is provided in package klaR. The relevant
dimension in kernel feature spaces can be estimated using rdetools which also offers procedures for
model selection and prediction.
Bayesian Methods : Bayesian Additive Regression Trees (BART), where the final model is defined in
terms of the sum over many weak learners (not unlike ensemble methods), are implemented in package
BayesTree. Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian
processes including Bayesian CART and treed linear models are made available by package tgp.
Bayesian logistic regression models that consider the high-order interactions are available from package
BPHO and Bayesian naive Bayes models for binary classification with bias corrected feature selection
is implemented in package predbayescor.
Optimization using Genetic Algorithms Packages gafit, rgp, and rgenoud offer optimization routines
based on genetic algorithms.
Association Rules : Package arules provides both data structures for efficient handling of sparse binary
data as well as interfaces to implementations of Apriori and Eclat for mining frequent itemsets, maximal
frequent itemsets, closed frequent itemsets and association rules.
Model selection and validation : Package e1071 has function tune() for hyper parameter tuning and
function errorest() (ipred) can be used for error rate estimation. The cost parameter C for support
vector machines can be chosen utilizing the functionality of package svmpath. Functions for ROC
analysis and other visualisation techniques for comparing candidate classifiers are available from
package ROCR. Package caret provides miscellaneous functions for building predictive models,
including parameter tuning and variable importance measures. The package can be used with various
parallel implementations (e.g. MPI, NWS etc).
Elements of Statistical Learning : Data sets, functions and examples from the book The Elements of
Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and
Jerome Friedman have been packaged and are available as ElemStatLearn.

CRAN packages:

arules
BayesTree
Boruta
BPHO
caret
CoxBoost
e1071 (core)
earth
elasticnet
ElemStatLearn
gafit
GAMBoost
gbm (core)
glmnet
glmpath

2 de 3 13/12/2010 06:30 p.m.


CRAN Task View: Machine Learning & Statistical Learning http://cran.r-project.org/web/views/MachineLearning.html

grplasso
hda
ipred
kernlab (core)
klaR
lars
lasso2
LiblineaR
LogicForest
LogicReg
mboost (core)
mvpart
nnet (core)
pamr
party
penalized
penalizedSVM
predbayescor
quantregForest
randomForest (core)
randomSurvivalForest
rda
rdetools
REEMtree
relaxo
rgenoud
rgp
ROCR
rpart (core)
RSNNS
RWeka
sda
SDDA
svmpath
tgp
tree
TWIX
varSelRF

Related links:

MLOSS: Machine Learning Open Source Software


Boosting Research Site

3 de 3 13/12/2010 06:30 p.m.

You might also like