Python For R Users
Python For R Users
By
Chandan Routray
As a part of internship at
www.decisionstats.com
Basic Commands
Functions
Python
install.packages('name')
pipinstallname
Load a package
library('name')
importnameasother_name
getwd()
importos
os.getcwd()
setwd()
os.chdir()
dir()
os.listdir()
ls()
globals()
Remove an object
rm('name')
del('object')
Dec 2014
Copyrigt www.decisionstats.com
Python
(Using pandas package*)
A<
matrix(runif(24,0,1),nrow=6,ncol=4)
df<data.frame(A)
Here,
runif function generates 24 random
numbers between 0 to 1
matrix function creates a matrix from
those random numbers, nrow and ncol
sets the numbers of rows and columns
to the matrix
data.frame converts the matrix to data
frame
importnumpyasnp
importpandasaspd
A=np.random.randn(6,4)
df=pd.DataFrame(A)
Here,
np.random.randn generates a
matrix of 6 rows and 4 columns;
this function is a part of numpy**
library
pd.DataFrame converts the matrix
in to a data frame
*To install Pandas library visit: http://pandas.pydata.org/; To import Pandas library type: import pandas as pd;
**To import Numpy library type: import numpy as np;
Dec 2014
Copyrigt www.decisionstats.com
Dec 2014
Copyrigt www.decisionstats.com
Python
Python
(Using pandas package*)
rownames(df)
df.index
colnames(df)
df.columns
head(df,x)
df.head(x)
tail(df,x)
df.tail(x)
dim(df)
df.shape
length(df)
len(df)
Dec 2014
Copyrigt www.decisionstats.com
Dec 2014
Python
Copyrigt www.decisionstats.com
Python
(Using pandas package*)
summary(df)
df.describe()
rownames(df)=c(A,B,C,D,
E,F)
df.index=[A,B,C,D,
E,F]
colnames=c(P,Q,R,S)
df.columns=[P,Q,R,S]
Dec 2014
Copyrigt www.decisionstats.com
Dec 2014
Copyrigt www.decisionstats.com
Python
(Using pandas package*)
Dec 2014
Copyrigt www.decisionstats.com
df[order(df$P),]
df.sort(['P'])
Dec 2014
Python
Copyrigt www.decisionstats.com
Python
(Using pandas package*)
df[x:y,]
df[x1:y]
Python starts counting from 0
df.loc[:,[X,Y]]
df[x:y,a:b]
df.iloc[x1:y,a1,b]
df[x,y]
df.iat[x1,y1]
Dec 2014
Copyrigt www.decisionstats.com
Dec 2014
Python
Copyrigt www.decisionstats.com
10
Python
(Using pandas package*)
Dec 2014
subset(df,A>0)
df[df.A>0]
Python
Copyrigt www.decisionstats.com
11
Mathematical Functions
Functions
Python
(import math and numpy library)
Dec 2014
Sum
sum(x)
math.fsum(x)
Square Root
sqrt(x)
math.sqrt(x)
Standard Deviation
sd(x)
numpy.std(x)
Log
log(x)
math.log(x[,base])
Mean
mean(x)
numpy.mean(x)
Median
median(x)
numpy.median(x)
Copyrigt www.decisionstats.com
12
Mathematical Functions
R
Dec 2014
Python
Copyrigt www.decisionstats.com
13
Data Manipulation
Functions
Python
(import math and numpy library)
as.numeric(x)
paste(x)
is.na(x)
math.isnan(x)
na.omit(list)
cleanedList=[xforxinlistifstr(x)!
='nan']
nchar(x)
len(x)
Dec 2014
Copyrigt www.decisionstats.com
14
Python
Sys.time()
datetime.datetime.now()
d<Sys.time()
d_format<ymd_hms(d)
d=datetime.datetime.now()
format=%Y%b%d%H:%M:%S
d_format=d.strftime(format)
Dec 2014
Copyrigt www.decisionstats.com
15
Data Visualization
Functions
Python
(import matplotlib library**)
plot(variable1,variable2)
plt.scatter(variable1,variable2)
plt.show()
boxplot(Var)
plt.boxplot(Var)
plt.show()
hist(Var)
plt.hist(Var)
plt.show()
pie(Var)
frompylabimport*
pie(Var)
show()
Copyrigt www.decisionstats.com
16
Dec 2014
Copyrigt www.decisionstats.com
17
Dec 2014
Copyrigt www.decisionstats.com
18
Dec 2014
Python
Copyrigt www.decisionstats.com
19
Dec 2014
Copyrigt www.decisionstats.com
20
Dec 2014
Python
Copyrigt www.decisionstats.com
22
Dec 2014
Python
Copyrigt www.decisionstats.com
21
Dec 2014
Python
Copyrigt www.decisionstats.com
23
Thank You
For feedback contact
DecisionStats.com
Coming up
Output: Virginica
Output: 1.64
Output: 1.65
fromsklearnimportensemble
fromsklearnimportdatasets
clf=
ensemble.RandomForestClassifier(n_estimato
rs=100,max_depth=10)
for(iin1:length(num_target)){
if(iris$Species[i]=='setosa'){num_target[i]<0} iris=datasets.load_iris()
X,y=iris.data[:1],iris.target[:1]
elseif(iris$Species[i]=='versicolor')
{num_target[i]<1}
clf.fit(X,y)
else{num_target[i]<2}}
printclf.predict(iris.data[1])
library(randomForest)
data(iris)
total_size<dim(iris)[1]
num_target<c(rep(0,total_size))
iris$Species<num_target
train_set<iris[1:149,]
test_set<iris[150,]
iris.rf<randomForest(Species~.,
data=train_set,ntree=100,importance=TRUE,
proximity=TRUE)
print(iris.rf)
predict(iris.rf,test_set[5],predict.all=TRUE)
Output: 1.845
Output: 2
Output: Virginica
trainset<iris[1:149,]
testset<iris[150,]
classifier<naiveBayes(trainset[,1:4],
trainset[,5])
clf=GaussianNB()
iris=datasets.load_iris()
X,y=iris.data[:1],iris.target[:1]
clf.fit(X,y)
printclf.predict(iris.data[1])
predict(classifier,testset[,5])
Output: Virginica
Output: Virginica
Thank You
For feedback please let us know at