We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
01823, 11:30 infographics-fnal jog (1000*10185)
Data Exploration
in Python USING
Pandas
NumPy stands for Numerical {| Pandas for structured ees hs cia]
Python. This library contains fj data operations and eternal)
basic linear algebra functions jj Manipulations. It is Peete et
Fourier bransforms,advanced lj extensively used for daba Pretend iaec kote yi
random number capabilities. Mj ™unging and preparation. ett
—_— CHEATSHEET -—
| oats
pr iem=cellleiden}
Drone enor
eon ae cue
eed
Peete
Earn ca
m - istogram, Scatter, Box Plot)?
ee Tenens
a eae ey
PRU Ree ea re RCL Lee
Dee Cs ate ots
oct
. ognize and treat missing values
ea
~~ DOSE eae Ce eos
tps twin analyticsvichya.comwp-contentuploads/201S106finfographics-nalpg wr
/
}01823, 11:30 infographics-fnal jog (1000*10185)
loading.
Function __[Deseription
/ read csv [Read delimiied data from file. Use Comma as default delimiter
read table [Rese delimited data from a file. Use tab (\)] as default delimiter
reag_excel [Read data from excel file
‘reac fwt [Reel data in fed wieth column format
[read clipboard [Read data from clipboard. Useful for converilng tables from web pages
‘fimport Library Pandas
#1 am working in Windows environment
#Reading the dabaset in a dabaframe using Pandas
# Load Data sheet of excel file EMP
# Load Data from text file having tab ‘it’ delimeter print df
How to convert a variable to different data type?
- Convert numeric variables to string variables \
tps tw analyticsvichya.comwp-contentuploads/201S106Ninfographics-nalpg an01823, 11:39 infographics-fnajog (100010185)
How to transpose a Data set?
#Transposing dabaframe by a variable
hitps tw analyticsvichya.com/wp-contentluploads/20'06\infographiesnal og01823, 1:30
infographics-fnal jog (1000*10185)
df=pdread_excel(Evtranspose.xisx’, "Sheebt')
print df.sort([Product’,'Sales'], ascending=[True, False])
Total rows:4 Total columns:3
1D Product Sales
2 1 eee
B02 AM 8
42 ab
Orginal Table
Tetairows:4 Tota column
1D Product Sales
Sorted Table
How to create plots (Histogram, Scatter, Box Plot)?
¥_Empib Gender ge Sales
00 M 38 28
E002 F Ea a4
£003 F 37 135;
e008 M 30 138
E005 F a TT
F006 M 36 a2
F007 M 32 138
F008, F 26 140,
£008 M 2 133
e010 M 36 133
Code Cr
4#Plob Hisbogram
import: matploblibppyplot as pits
import pandas as pd
dt=pdead_excel( E/First.xisx’, "SheetT)
+#Plots in matploblib reside within a figure
object, use plbfigure bo create new figure
fig=plsfigure()
#Creabe one or more subplots using
‘add_subplot, because you cans
create blank figure
ax = fig.add_subplob(tis)
Variable
axhisb(atAge'}bins = 5)
Labels and Tit
tps twin analyticsvichya.comwp-contentuploads/20 1S106Ninfographics-nalpg
‘Age distribution
ar01823, 11:39 infographics-fnaljog (1000%10185)
plt.title(Age distribution’)
plbxlabel(Age’)
pit ylabel(#Employee)
ptishow()
ca
{Plots in matploblb reside within a figure
object, use plifigure bo create new figure
fig=pltfigure()
Sales and age distribution
Create one or more subplots using
ade_subplot, because you can's
create blank figure Bs
ax = figadd_subpiot(tis)
#Variable
axscabter(df[Age']df[Sales])
#Labels and Tit as
pllbibie(Sales and Age distribution’) aa
plb.xlabel(Age’)
pltylabel(Sales)
hitps tw analyticsvichya.comiwp-contentfuploads/20',06\infographiesnal og
Normal
weight
reaght
3701823, 11:30 Infographics-fnal jog (1000*10185)
How to do sample Data set in Python?
+#Create Sample dataframe
import: numpy @s np
import pandas as pd
from random import sample
# create random index
rindex = np.array(sample(xrange(len(df)), 5))
# get 5 random rows from df
fr = dtixfrindex)
print dfr
How to remove duplicate values of a variable?
Code Output
rem_dup=dtdrop_duplicates(['Gender’, 'BMf])
print rem_dup
How to group variables in Python to calculate count, average, sum?
best= dtgroupby(fGender]) ace [ee
best.describe() [sender
tps twin analyticsvichya.comwp-contentuploads/201S106Nifographics-nalpg er01823, 1:30 Infographics-fnal jog (1000*10185)
an_| oan | va on009|
[ain [seasnno|raaso00
oe saseo0o| 2500000
00000 79825500)
a0 [ecneeos
POUR CBee tte braces tye mc
lues and outliers?
tisnull() 61+ | # sdantaty missing values of a
SEEBHOEEE
EMPIO | Gender|Age | ies] 51
Fate [rae |rase| Fate [Fate
Fase |rase |raise| Fase |rae|
Fase |rase |raise| Fave [Fate|
Fane |rase [ra
Code :
Fate [Fate |Faice| Fate [Fate
Fase [Fase [raise Fats [Fase
import numpy as np.
meandge = npmean(dtAge)
tage = dtAgesillna(meanAge)
How to merge / join data sets?
df_new = pdmerge(dft, df2, how = finer, left_index = True, right_index = True)
# merges fl and df2 on index
#By changing how = outer’, you can do outer join.
# Similarly how = left’ will do a left join
# You can also specify the columns to,join instead of indexes, which are used by default,
To view the complete guide on Data Exploration in Python
visit here - hbtp://bit.ly/1KWhaHH a7 Vidhya
Learn Everything About Antes
Fase |rase |rase| arse [rate|
tps twin analytics vichya.comwp-contentuploads/201S106Ninfographics-nalpg