R Reference Card For Data Mining
R Reference Card For Data Mining
R Reference Card For Data Mining
part models with an enhanced version of plot.rpart in the proximus() cluster the rows of a logical matrix using the Proximus algo-
rpart package rithm (cba)
isopam() Isopam clustering algorithm (isopam)
by Yanchang Zhao, [email protected], April 1, 2011 Regression LLAhclust() hierarchical clustering based on likelihood linkage analysis
The latest version is available at http://www.rdatamining.com. See the
link also for document R and Data Mining: Examples and Case Studies – an Functions (LLAhclust)
introduction on using R for data mining applications. lm() linear regression flashClust() optimal hierarchical clustering (flashClust)
The package name is in parentheses. glm() generalized linear regression fastcluster() fast hierarchical clustering (fastcluster)
nls() non-linear regression cutreeDynamic(), cutreeHybrid() detection of clusters in hierar-
predict() predict with models chical clustering dendrograms (dynamicTreeCut)
residuals() residuals, the difference between observed values and fitted HierarchicalSparseCluster() Hierarchical sparse clustering
Association Rules & Frequent Itemsets values (sparcl)
gls() fit a linear model using generalized least squares (nlme)
APRIORI Algorithm gnls() fit a nonlinear model using generalized least squares (nlme) Model based Clustering
a level-wise, breadth-first algorithm which counts transactions to find frequent Packages Mclust() model-based clustering (mclust)
itemsets HDDC() a model-based method for high dimensional data clustering (HD-
nlme linear and nonlinear mixed effects models
apriori() mine associations with APRIORI algorithm (arules) classif )
Clustering fixmahal() Mahalanobis Fixed Point Clustering (fpc)
ECLAT Algorithm Partitioning based Clustering fixreg() Regression Fixed Point Clustering (fpc)
employs equivalence classes, depth-first search and set intersection instead of partition the data into k groups first and then try to improve the quality of mergenormals() clustering by merging Gaussian mixture components
counting clustering by moving objects from one group to another (fpc)
eclat() mine frequent itemsets with the Eclat algorithm (arules) kmeans() perform k-means clustering on a data matrix Density based Clustering
pam() the Partitioning Around Medoids (PAM) clustering method (cluster) generate clusters by connecting dense regions
Packages kmeansCBI() interface function for clustering methods (fpc) dbscan(data,eps,MinPts,...) generate a density based clustering
arules mine frequent itemsets, maximal frequent itemsets, closed frequent kmeansruns() call kmeans for the k-means clustering method and in- of arbitrary shape. eps and MinPts (fpc)
itemsets and association rules. It includes two algorithms, Apriori cludes estimation of the number of clusters and finding an optimal pdfCluster() clustering via kernel density estimation (pdfCluster)
and Eclat. solution from several starting points (fpc)
arulesViz visualizing association rules cluster.optimal() search for the optimal k-clustering of the dataset
(bayesclust) Other Clustering Techniques
Sequential Patterns pamk() the Partitioning Around Medoids (PAM) clustering method with es- mixer() random graph clustering (mixer)
timation of number of clusters (fpc) nncluster() fast clustering with restarted minimum spanning tree
Functions (nnclust)
clara() Clustering Large Applications (cluster)
cspade() mining frequent sequential patterns with the cSPADE algorithm orclus() ORCLUS subspace clustering (orclus)
fanny(x,k,...) compute a fuzzy clustering of the data into k clusters
(arulesSequences) Plotting Clustering Solutions
(cluster)
seqefsub() searching for frequent subsequences (TraMineR)
kcca() k-centroids clustering (flexclust) plotcluster() visualisation of a clustering or grouping in data (fpc)
Packages ccfkms() clustering with Conjugate Convex Functions plot.hclust() plot clusters (fpc)
arulesSequences add-on for arules to handle and mine frequent sequences apcluster() affinity propagation clustering for a given similarity matrix plot.agnes(), plot.diana(), plot.mona(),
TraMineR mining, describing and visualizing sequences of states or events (apcluster) plot.partition() plot clusters (cluster)
Classification & Prediction apclusterK() affinity propagation clustering to get K clusters (apcluster) bannerplot() a horizontal barplot visualizing a hierarchical clustering
cclust() Convex Clustering, incl. k-means and two other clustering algo- (cluster)
Decision Trees rithms (cclust) Cluster Validation
ctree() conditional inference trees, recursive partitioning for continuous, KMeansSparseCluster() sparse k-means clustering (sparcl) silhouette() compute or extract silhouette information (cluster)
censored, ordered, nominal and multivariate response variables in a tclust(x,k,alpha,...) trimmed k-means with which a proportion cluster.stats() compute several cluster validity statistics from a clus-
conditional inference framework (party) alpha of observations may be trimmed (tclust) tering and a dissimilarity matrix (fpc)
rpart() recursive partitioning and regression trees (rpart)
clValid() calculate validation measures for a given set of clustering algo-
mob() model-based recursive partitioning, yielding a tree with fitted models
Hierarchical Clustering rithms and number of clusters (clValid)
associated with each terminal node (party)
a hierarchical decomposition of data in either bottom-up (agglomerative) or clustIndex() calculate the values of several clustering indexes, which
Random Forest top-down (divisive) way can be independently used to determine the number of clusters exist-
cforest() random forest and bagging ensemble (party) hclust(d, method, ...) hierarchical cluster analysis on a set of dis- ing in a data set
randomForest() random forest (randomForest) similarities d using the method for agglomeration Packages
Packages pvclust() hierarchical clustering with p-values via multi-scale bootstrap cluster cluster analysis
rpart recursive partitioning and regression trees resampling (pvclust) fpc various methods for clustering and cluster validation
party recursive partitioning agnes() agglomerative hierarchical clustering (cluster) mclust model-based clustering and normal mixture modeling
randomForest classification and regression based on a forest of trees using diana() divisive hierarchical clustering (cluster) pvclust hierarchical clustering with p-values
random inputs mona() divisive hierarchical clustering of a dataset with binary variables apcluster Affinity Propagation Clustering
rpartOrdinal ordinal classification trees, deriving a classification tree when only (cluster)
the response to be predicted is ordinal rockCluster() cluster a data matrix using the Rock algorithm (cba)
cclust Convex Clustering methods, including k-means algorithm, On-line Up- Text Mining Graphics
date algorithm and Neural Gas algorithm and calculation of indexes
for finding the number of clusters in a data set Functions Functions
cba Clustering for Business Analytics, including clustering techniques such TermDocumentMatrix(), DocumentTermMatrix() construct a plot() generic function for plotting (graphics)
as Proximus and Rock term-document matrix or a document-term matrix (tm) barplot(), pie(), hist() bar chart, pie chart and histogram
bclust Bayesian clustering using spike-and-slab hierarchical model, suitable Dictionary() construct a dictionary from a character vector or a term- (graphics)
for clustering high-dimensional data document matrix (tm) boxplot() box-and-whisker plot (graphics)
biclust algorithms to find bi-clusters in two-dimensional data findAssocs() find associations in a term-document matrix (tm) stripchart() one dimensional scatter plot (graphics)
clue cluster ensembles findFreqTerms() find frequent terms in a term-document matrix (tm) dotchart() Cleveland dot plot (graphics)
clues clustering method based on local shrinking stemDocument() stem words in a text document (tm) qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot (stats)
clValid validation of clustering results stemCompletion() complete stemmed words (tm) coplot() conditioning plot (graphics)
clv cluster validation techniques, contains popular internal and external clus- termFreq() generate a term frequency vector from a text document (tm) splom() conditional scatter plot matrices (lattice)
ter validation methods for outputs produced by package cluster stopwords(language) return stopwords in different languages (tm) pairs() a matrix of scatterplots (graphics)
clustTool GUI for clustering data with spatial information removeNumbers(), removePunctuation(), removeWords() cpairs() enhanced scatterplot matrix (gclus)
bayesclust tests/searches for significant clusters in genetic data remove numbers, punctuation marks, or a set of words from a text parcoord() parallel coordinate plot (MASS)
clustvarsel variable selection for model-based clustering document (tm) cparcoord() enhanced parallel coordinate plot (gclus)
clustsig significant cluster analysis, tests to see which (if any) clusters are removeSparseTerms() remove sparse terms from a term-document ma- paracoor() parallel coordinates plot (denpro)
statistically different trix (tm) parallel() parallel coordinates plot (lattice)
clusterfly explore clustering interactively textcat() n-gram based text categorization (textcat) densityplot() kernel density plot (lattice)
clusterSim search for optimal clustering procedure for a data set SnowballStemmer() Snowball word stemmers (Snowball) contour(), filled.contour() contour plot (graphics)
clusterGeneration random cluster generation Packages levelplot(), contourplot() level plots and contour plots (lattice)
clusterCons calculate the consensus clustering result from re-sampled clus- tm a framework for text mining applications sunflowerplot() a sunflower scatter Plot (graphics)
tering experiments with the option of using multiple algorithms and tm.plugin.dc a plug-in for package tm to support distributed text mining assocplot() association plot (graphics)
parameter tm.plugin.mail a plug-in for package tm to handle mail mosaicplot() mosaic plot (graphics)
gcExplorer graphical cluster explorer RcmdrPlugin.TextMining GUI for demonstration of text mining concepts matplot() plot the columns of one matrix against the columns of another
hybridHclust hybrid hierarchical clustering via mutual clusters and tm package (graphics)
Modalclust hierarchical modal Clustering textir a suite of tools for inference about text documents and associated sen- fourfoldplot() a fourfold display of a 2 × 2 × k contingency table
iCluster integrative clustering of multiple genomic data types timent (graphics)
EMCC Evolutionary Monte Carlo (EMC) methods for clustering tau utilities for text analysis persp() perspective plots of surfaces over the xy plane (graphics)
rEMM Extensible Markov Model (EMM) for Data Stream Clustering textcat n-gram based text categorization cloud(), wireframe() 3d scatter plots and surfaces (lattice)
SGCS Spatial Graph based Clustering Summaries for spatial point patterns YjdnJlp Japanese text analysis by Yahoo! Japan Developer Network interaction.plot() two-way interaction plot (stats)
iplot(), ihist(), ibar(), ipcp() interactive scatter plot, his-
Time Series Analysis Statistics togram, bar plot, and parallel coordinates plot (iplots)
Construction & Plot Analysis of Variance pdf(), postscript(), win.metafile(), jpeg(), bmp(),
ts() create time-series objects (stats) aov() fit an analysis of variance model (stats) png(), tiff() save graphs into files of various formats
plot.ts() plot time-series objects (stats) anova() compute analysis of variance (or deviance) tables for one or more Packages
smoothts() time series smoothing (ast) fitted model objects (stats) lattice a powerful high-level data visualization system, with an emphasis on
sfilter() remove seasonal fluctuation using moving average (ast) Statistical Test multivariate data
Decomposition t.test() student’s t-test (stats) vcd visualizing categorical data
decomp() time series decomposition by square-root filter (timsac) prop.test() test of equal or given proportions (stats) denpro visualization of multivariate, functions, sets, and data
decompose() classical seasonal decomposition by moving averages (stats) binom.test() exact binomial test (stats) iplots interactive graphics
stl() seasonal decomposition of time series by loess (stats) Mixed Effects Models Data Manipulation
tsr() time series decomposition (ast) lme() fit a linear mixed-effects model (nlme) na.fail(), na.omit(), na.exclude(), na.pass() handle
ardec() time series autoregressive decomposition (ArDec) nlme() fit a nonlinear mixed-effects model (nlme) missing values
Forecasting Principal Components and Factor Analysis scale() scaling and centering of matrix-like objects
arima() fit an ARIMA model to a univariate time series (stats) princomp() principal components analysis (stats) t() matrix transpose
predict.Arima forecast from models fitted by arima (stats) prcomp() principal components analysis (stats) aperm() array transpose
Packages Other Functions sample() sampling
timsac time series analysis and control program table(), tabulate(), xtabs() cross tabulation (stats)
var(), cov(), cor() variance, covariance, and correlation (stats)
ast time series analysis stack(), unstack() stacking vectors
density() compute kernel density estimates (stats)
ArDec time series autoregressive-based decomposition reshape() reshape a data frame between “wide” format and “long” format
ares a toolbox for time series analyses using generalized additive models Packages (stats)
dse tools for multivariate, linear, time-invariant, time series models nlme linear and nonlinear mixed effects models merge() merge two data frames
forecast displaying and analysing univariate time series forecasts aggregate() compute summary statistics of data subsets (stats)
by() apply a function to a data frame split by factors
tapply() apply a function to each cell of a ragged array Editors/GUIs
Data Access Tinn-R a free GUI for R language and environment.
rattle graphical user interface for data mining in R
Functions Rpad workbook-style, web-based interface to R
save(), load() save and load R data objects RPMG graphical user interface (GUI) for interactive R analysis sessions
read.csv(), write.csv() import from and export to .CSV files
read.table(), write.table(), scan(), write() read and Other R Reference Cards
write data R Reference Card, by Tom Short
write.matrix() write a matrix or data frame (MASS) http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/
sqlQuery() submit an SQL query to an ODBC database (RODBC) R-refcard.pdf or
odbcConnect(), odbcClose() open/close connections to ODBC http://cran.r-project.org/doc/contrib/Short-refcard.pdf
databases (RODBC) R Reference Card, by Jonathan Baron
dbSendQuery execute an SQL statement on a given database connection http://cran.r-project.org/doc/contrib/refcard.pdf
(DBI) R Functions for Regression Analysis, by Vito Ricci
dbConnect(), dbDisconnect() create/close a connection to a http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.
DBMS (DBI) pdf
Packages R Functions for Time Series Analysis, by Vito Ricci
RODBC ODBC database access http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
DBI a database interface (DBI) between R and relational DBMS
RMySQL interface to the MySQL database
RJDBC access to databases through the JDBC interface
ROracle Oracle database interface (DBI) driver
RODM interface to Oracle Data Mining
Interface to Weka
Package RWeka is an R interface to Weka, and enables to use the following
Weka functions in R.
Association rules:
Apriori(), Tertius()
Regression and classification:
LinearRegression(), Logistic(), SMO()
Lazy classifiers:
IBk(), LBR()
Meta classifiers:
AdaBoostM1(), Bagging(), LogitBoost(),
MultiBoostAB(), Stacking(),
CostSensitiveClassifier()
Rule classifiers:
JRip(), M5Rules(), OneR(), PART()
Regression and classification trees:
J48(), LMT(), M5P(), DecisionStump()
Clustering:
Cobweb(), FarthestFirst(), SimpleKMeans(),
XMeans(), DBScan()
Filters:
Normalize(), Discretize()
Word stemmers:
IteratedLovinsStemmer(), LovinsStemmer()
Tokenizers:
AlphabeticTokenizer(), NGramTokenizer(),
WordTokenizer()
Generating Reports
Sweave() mixing text and S code for automatic report generation
R2HTML making HTML reports
R2PPT generating Microsoft PowerPoint presentations