0% found this document useful (0 votes)

259 views6 pages

Python Libraries Cheat Sheets

The document discusses tidy data and data wrangling using pandas. Tidy data is when each variable is in its own column and each observation is in its own row. The document outlines how to create, reshape, subset and query dataframes in pandas.

Uploaded by

Sushma M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

259 views6 pages

Python Libraries Cheat Sheets

Uploaded by

Sushma M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Tidy Data – A foundation for wrangling in pandas

Data Wrangling
& *
Tidy data complements pandas’s vectorized
with pandas Cheat Sheet In a tidy operations. pandas will automatically preserve
http://pandas.pydata.org data set: observations as you manipulate variables. No
other format works as intuitively with pandas.
M A
Pandas API Reference Pandas User Guide Each variable is saved
in its own column
Each observation is
saved in its own row *
Creating DataFrames Reshaping Data – Change layout, sorting, reindexing, renaming
a b c df.sort_values('mpg')
1 4 7 10 Order rows by values of a column (low to high).
2 5 8 11
df.sort_values('mpg’, ascending=False)
3 6 9 12
Order rows by values of a column (high to low).
df = pd.DataFrame(
{"a" : [4, 5, 6], pd.melt(df) df.pivot(columns='var', values='val') df.rename(columns = {'y':'year'})
Gather columns into rows. Spread rows into columns. Rename the columns of a DataFrame
"b" : [7, 8, 9],
"c" : [10, 11, 12]}, df.sort_index()
index = [1, 2, 3]) Sort the index of a DataFrame
Specify values for each column.
df.reset_index()
df = pd.DataFrame( Reset index of DataFrame to row numbers, moving
[[4, 7, 10], index to columns.
[5, 8, 11], pd.concat([df1,df2], axis=1)
pd.concat([df1,df2]) df.drop(columns=['Length’, 'Height'])
[6, 9, 12]], Append columns of DataFrames
Append rows of DataFrames Drop columns from DataFrame
index=[1, 2, 3],
columns=['a', 'b', 'c'])
Specify values for each row. Subset Observations - rows Subset Variables - columns Subsets - rows and columns
a b c Use df.loc[] and df.iloc[] to select only
N v rows, only columns or both.
1 4 7 10 Use df.at[] and df.iat[] to access a single
D
2 5 8 11
df[df.Length > 7] df[['width’, 'length’, 'species']] value by row and column.
Extract rows that meet logical criteria. Select multiple columns with specific names. First index selects rows, second index columns.
e 2 6 9 12
df.drop_duplicates() df['width'] or df.width
df.iloc[10:20]
df = pd.DataFrame( Remove duplicate rows (only considers columns). Select single column with specific name.
Select rows 10-20.
{"a" : [4 ,5, 6], df.sample(frac=0.5) df.filter(regex='regex')
df.iloc[:, [1, 2, 5]]
"b" : [7, 8, 9], Randomly select fraction of rows. Select columns whose name matches
Select columns in positions 1, 2 and 5 (first
"c" : [10, 11, 12]}, df.sample(n=10) Randomly select n rows. regular expression regex.
column is 0).
index = pd.MultiIndex.from_tuples( df.nlargest(n, 'value’)
df.loc[:, 'x2':'x4']
[('d’, 1), ('d’, 2), Select and order top n entries. Using query Select all columns between x2 and x4 (inclusive).
('e’, 2)], names=['n’, 'v'])) df.nsmallest(n, 'value')
query() allows Boolean expressions for filtering df.loc[df['a'] > 10, ['a’, 'c']]
Create DataFrame with a MultiIndex Select and order bottom n entries.
rows. Select rows meeting logical condition, and only
df.head(n)
df.query('Length > 7') the specific columns .
Select first n rows.
Method Chaining df.tail(n)
Select last n rows.
df.query('Length > 7 and Width < 8')
df.query('Name.str.startswith("abc")',
df.iat[1, 2] Access single value by index
df.at[4, 'A'] Access single value by label
Most pandas methods return a DataFrame so that engine="python")
another pandas method can be applied to the result. Logic in Python (and pandas) regex (Regular Expressions) Examples
This improves readability of code.
< Less than != Not equal to '\.' Matches strings containing a period '.'
df = (pd.melt(df)
.rename(columns={ > Greater than df.column.isin(values) Group membership 'Length$' Matches strings ending with word 'Length'

'variable':'var', == Equals pd.isnull(obj) Is NaN '^Sepal' Matches strings beginning with the word 'Sepal'
'value':'val'}) <= Less than or equals pd.notnull(obj) Is not NaN '^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
.query('val >= 200')
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all '^(?!Species$).*' Matches strings except the string 'Species'
)
Cheatsheet for pandas (http://pandas.pydata.org/ originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet
Summarize Data Handling Missing Data Combine Data Sets
df['w'].value_counts() df.dropna() adf bdf
Count number of rows with each unique value of variable Drop rows with any column having NA/null data. x1 x2 x1 x3
len(df) df.fillna(value) A 1 A T
# of rows in DataFrame. Replace all NA/null data with value. B 2 B F
df.shape C 3 D T
Tuple of # of rows, # of columns in DataFrame.
df['w'].nunique()
Make New Columns Standard Joins

# of distinct values in a column. x1 x2 x3 pd.merge(adf, bdf,

df.describe() A 1 T how='left', on='x1')
Basic descriptive and statistics for each column (or GroupBy). B 2 F Join matching rows from bdf to adf.
C 3 NaN
df.assign(Area=lambda df: df.Length*df.Height)
Compute and append one or more new columns. x1 x2 x3 pd.merge(adf, bdf,
df['Volume'] = df.Length*df.Height*df.Depth A 1.0 T how='right', on='x1')
pandas provides a large set of summary functions that operate on B 2.0 F
Add single column. Join matching rows from adf to bdf.
different kinds of pandas objects (DataFrame columns, Series, D NaN T
pd.qcut(df.col, n, labels=False)
GroupBy, Expanding and Rolling (see below)) and produce single
Bin column into n buckets.
values for each of the groups. When applied to a DataFrame, the x1 x2 x3 pd.merge(adf, bdf,
result is returned as a pandas Series for each column. Examples: A 1 T how='inner', on='x1')
sum() min() Vector Vector B 2 F Join data. Retain only rows in both sets.
function function
Sum values of each object. Minimum value in each object.
count() max() x1 x2 x3 pd.merge(adf, bdf,
Count non-NA/null values of Maximum value in each object. pandas provides a large set of vector functions that operate on all A 1 T how='outer', on='x1')
each object. mean() columns of a DataFrame or a single selected column (a pandas B 2 F Join data. Retain all values, all rows.
median() Mean value of each object. Series). These functions produce vectors of values for each of the C 3 NaN
Median value of each object. var() columns, or a single Series for the individual Series. Examples: D NaN T
quantile([0.25,0.75]) Variance of each object.
max(axis=1) min(axis=1) Filtering Joins
Quantiles of each object. std()
Element-wise max. Element-wise min. x1 x2 adf[adf.x1.isin(bdf.x1)]
apply(function) Standard deviation of each
clip(lower=-10,upper=10) abs() A 1 All rows in adf that have a match in bdf.
Apply function to each object. object.
Trim values at input thresholds Absolute value. B 2
Group Data x1 x2 adf[~adf.x1.isin(bdf.x1)]
The examples below can also be applied to groups. In this case, the C 3 All rows in adf that do not have a match in bdf.
df.groupby(by="col")
Return a GroupBy object, grouped function is applied on a per-group basis, and the returned vectors
by values in column named "col". are of the length of the original DataFrame. ydf zdf
shift(1) x1 x2 x1 x2
shift(-1)
df.groupby(level="ind") Copy with values shifted by 1. A 1 B 2
Copy with values lagged by 1.
Return a GroupBy object, grouped rank(method='dense') B 2 C 3
cumsum()
by values in index level named Ranks with no gaps. C 3 D 4
Cumulative sum.
"ind". rank(method='min') cummax() Set-like Operations
Ranks. Ties get min rank. Cumulative max. x1 x2 pd.merge(ydf, zdf)
All of the summary functions listed above can be applied to a group. rank(pct=True) cummin() B 2 Rows that appear in both ydf and zdf
Additional GroupBy functions: Ranks rescaled to interval [0, 1]. Cumulative min. C 3 (Intersection).
size() agg(function) rank(method='first') cumprod()
Size of each group. Aggregate group using function. Ranks. Ties go to first value. Cumulative product. x1 x2 pd.merge(ydf, zdf, how='outer')
A 1 Rows that appear in either or both ydf and zdf
Windows Plotting B
C
2
3
(Union).
df.expanding() df.plot.hist() df.plot.scatter(x='w',y='h') D 4 pd.merge(ydf, zdf, how='outer',
Return an Expanding object allowing summary functions to be Histogram for each column Scatter chart using pairs of points
x1 x2 indicator=True)
applied cumulatively.
A 1 .query('_merge == "left_only"')
df.rolling(n)
.drop(columns=['_merge'])
Return a Rolling object allowing summary functions to be
Rows that appear in ydf but not zdf (Setdiff).
applied to windows of length n.
Cheatsheet for pandas (http://pandas.pydata.org/) originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet
Python For Data Science Cheat Sheet Inspecting Your Array Subsetting, Slicing, Indexing Also see Lists
>>> a.shape Array dimensions Subsetting
NumPy Basics >>>
>>>
len(a)
b.ndim
Length of array
Number of array dimensions
>>> a[2]
3
1 2 3 Select the element at the 2nd index
Learn Python for Data Science Interactively at www.DataCamp.com >>> e.size Number of array elements >>> b[1,2] 1.5 2 3 Select the element at row 1 column 2
>>> b.dtype Data type of array elements 6.0 4 5 6 (equivalent to b[1][2])
>>> b.dtype.name Name of data type
>>> b.astype(int) Convert an array to a different type Slicing
NumPy >>> a[0:2]
array([1, 2])
1 2 3 Select items at index 0 and 1
2
The NumPy library is the core library for scientific computing in Asking For Help >>> b[0:2,1] 1.5 2 3 Select items at rows 0 and 1 in column 1
>>> np.info(np.ndarray.dtype) array([ 2., 5.]) 4 5 6
Python. It provides a high-performance multidimensional array
Array Mathematics
1.5 2 3
>>> b[:1] Select all items at row 0
object, and tools for working with these arrays. array([[1.5, 2., 3.]]) 4 5 6 (equivalent to b[0:1, :])
Arithmetic Operations >>> c[1,...] Same as [1,:,:]
Use the following import convention: array([[[ 3., 2., 1.],
>>> import numpy as np [ 4., 5., 6.]]])
>>> g = a - b Subtraction
array([[-0.5, 0. , 0. ], >>> a[ : :-1] Reversed array a
NumPy Arrays [-3. , -3. , -3. ]])
array([3, 2, 1])

>>> np.subtract(a,b) Boolean Indexing

1D array 2D array 3D array Subtraction
>>> a[a<2] Select elements from a less than 2
>>> b + a Addition 1 2 3
array([[ 2.5, 4. , 6. ], array([1])
axis 1 axis 2
1 2 3 axis 1 [ 5. , 7. , 9. ]]) Fancy Indexing
1.5 2 3 >>> np.add(b,a) Addition >>> b[[1, 0, 1, 0],[0, 1, 2, 0]] Select elements (1,0),(0,1),(1,2) and (0,0)
axis 0 axis 0 array([ 4. , 2. , 6. , 1.5])
4 5 6 >>> a / b Division
array([[ 0.66666667, 1. , 1. ], >>> b[[1, 0, 1, 0]][:,[0,1,2,0]] Select a subset of the matrix’s rows
[ 0.25 , 0.4 , 0.5 ]]) array([[ 4. ,5. , 6. , 4. ], and columns
>>> np.divide(a,b) Division [ 1.5, 2. , 3. , 1.5],
Creating Arrays >>> a * b
array([[ 1.5, 4. , 9. ],
Multiplication
[ 4. , 5.
[ 1.5, 2.
,
,
6.
3.
,
,
4. ],
1.5]])

>>> a = np.array([1,2,3]) [ 4. , 10. , 18. ]])

>>> b = np.array([(1.5,2,3), (4,5,6)], dtype = float) >>> np.multiply(a,b) Multiplication Array Manipulation
>>> c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], >>> np.exp(b) Exponentiation
dtype = float) >>> np.sqrt(b) Square root Transposing Array
>>> np.sin(a) Print sines of an array >>> i = np.transpose(b) Permute array dimensions
Initial Placeholders >>> np.cos(b) Element-wise cosine >>> i.T Permute array dimensions
>>> np.log(a) Element-wise natural logarithm
>>> np.zeros((3,4)) Create an array of zeros >>> e.dot(f) Dot product
Changing Array Shape
>>> np.ones((2,3,4),dtype=np.int16) Create an array of ones array([[ 7., 7.], >>> b.ravel() Flatten the array
>>> d = np.arange(10,25,5) Create an array of evenly [ 7., 7.]]) >>> g.reshape(3,-2) Reshape, but don’t change data
spaced values (step value)
>>> np.linspace(0,2,9) Create an array of evenly Comparison Adding/Removing Elements
spaced values (number of samples) >>> h.resize((2,6)) Return a new array with shape (2,6)
>>> e = np.full((2,2),7) Create a constant array >>> a == b Element-wise comparison >>> np.append(h,g) Append items to an array
>>> f = np.eye(2) Create a 2X2 identity matrix array([[False, True, True], >>> np.insert(a, 1, 5) Insert items in an array
>>> np.random.random((2,2)) Create an array with random values [False, False, False]], dtype=bool) >>> np.delete(a,[1]) Delete items from an array
>>> np.empty((3,2)) Create an empty array >>> a < 2 Element-wise comparison
array([True, False, False], dtype=bool) Combining Arrays
>>> np.array_equal(a, b) Array-wise comparison >>> np.concatenate((a,d),axis=0) Concatenate arrays
I/O array([ 1, 2,
>>> np.vstack((a,b))
3, 10, 15, 20])
Stack arrays vertically (row-wise)
Aggregate Functions array([[ 1. , 2. , 3. ],
Saving & Loading On Disk [ 1.5, 2. , 3. ],
>>> a.sum() Array-wise sum [ 4. , 5. , 6. ]])
>>> np.save('my_array', a) >>> a.min() Array-wise minimum value >>> np.r_[e,f] Stack arrays vertically (row-wise)
>>> np.savez('array.npz', a, b) >>> b.max(axis=0) Maximum value of an array row >>> np.hstack((e,f)) Stack arrays horizontally (column-wise)
>>> np.load('my_array.npy') >>> b.cumsum(axis=1) Cumulative sum of the elements array([[ 7., 7., 1., 0.],
>>> a.mean() Mean [ 7., 7., 0., 1.]])
Saving & Loading Text Files >>> b.median() Median >>> np.column_stack((a,d)) Create stacked column-wise arrays
>>> np.loadtxt("myfile.txt") >>> a.corrcoef() Correlation coefficient array([[ 1, 10],
>>> np.std(b) Standard deviation [ 2, 15],
>>> np.genfromtxt("my_file.csv", delimiter=',') [ 3, 20]])
>>> np.savetxt("myarray.txt", a, delimiter=" ") >>> np.c_[a,d] Create stacked column-wise arrays
Copying Arrays Splitting Arrays
Data Types >>> h = a.view() Create a view of the array with the same data >>> np.hsplit(a,3) Split the array horizontally at the 3rd
>>> np.copy(a) Create a copy of the array [array([1]),array([2]),array([3])] index
>>> np.int64 Signed 64-bit integer types >>> np.vsplit(c,2) Split the array vertically at the 2nd index
>>> np.float32 Standard double-precision floating point >>> h = a.copy() Create a deep copy of the array [array([[[ 1.5, 2. , 1. ],
>>> np.complex Complex numbers represented by 128 floats [ 4. , 5. , 6. ]]]),
array([[[ 3., 2., 3.],
>>>
>>>
np.bool
np.object
Boolean type storing TRUE and FALSE values
Python object type Sorting Arrays [ 4., 5., 6.]]])]

>>> np.string_ Fixed-length string type >>> a.sort() Sort an array

>>> np.unicode_ Fixed-length unicode type >>> c.sort(axis=0) Sort the elements of an array's axis DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet 3 Plotting With Seaborn
Seaborn Axis Grids
Learn Data Science Interactively at www.DataCamp.com >>> g = sns.FacetGrid(titanic, Subplot grid for plotting conditional >>> h = sns.PairGrid(iris) Subplot grid for plotting pairwise
col="survived", relationships >>> h = h.map(plt.scatter) relationships
row="sex") >>> sns.pairplot(iris) Plot pairwise bivariate distributions
>>> g = g.map(plt.hist,"age") >>> i = sns.JointGrid(x="x", Grid for bivariate plot with marginal
>>> sns.factorplot(x="pclass", Draw a categorical plot onto a y="y", univariate plots
y="survived", Facetgrid data=data)
Statistical Data Visualization With Seaborn hue="sex",
data=titanic)
>>> i = i.plot(sns.regplot,
sns.distplot)
The Python visualization library Seaborn is based on >>> sns.lmplot(x="sepal_width", Plot data and regression model fits >>> sns.jointplot("sepal_length", Plot bivariate distribution
y="sepal_length", across a FacetGrid "sepal_width",
matplotlib and provides a high-level interface for drawing hue="species", data=iris,
attractive statistical graphics. data=iris) kind='kde')

Categorical Plots Regression Plots

Make use of the following aliases to import the libraries: >>> sns.regplot(x="sepal_width", Plot data and a linear regression
Scatterplot
>>> import matplotlib.pyplot as plt y="sepal_length", model fit
>>> sns.stripplot(x="species", Scatterplot with one
>>> import seaborn as sns data=iris,
y="petal_length", categorical variable
data=iris) ax=ax)
The basic steps to creating plots with Seaborn are: >>> sns.swarmplot(x="species", Categorical scatterplot with Distribution Plots
y="petal_length", non-overlapping points
1. Prepare some data data=iris) >>> plot = sns.distplot(data.y, Plot univariate distribution
2. Control figure aesthetics Bar Chart kde=False,
color="b")
3. Plot with Seaborn >>> sns.barplot(x="sex", Show point estimates and
y="survived", confidence intervals with Matrix Plots
4. Further customize your plot hue="class", scatterplot glyphs
>>> sns.heatmap(uniform_data,vmin=0,vmax=1) Heatmap
data=titanic)
>>> import matplotlib.pyplot as plt Count Plot
>>>
>>>
>>>
import seaborn as sns
tips = sns.load_dataset("tips")
sns.set_style("whitegrid") Step 2
Step 1
>>> sns.countplot(x="deck",
data=titanic,
Show count of observations
4 Further Customizations Also see Matplotlib
palette="Greens_d")
>>> g = sns.lmplot(x="tip", Step 3
Point Plot Axisgrid Objects
y="total_bill",
data=tips, >>> sns.pointplot(x="class", Show point estimates and >>> g.despine(left=True) Remove left spine
aspect=2) y="survived", confidence intervals as >>> g.set_ylabels("Survived") Set the labels of the y-axis
>>> g = (g.set_axis_labels("Tip","Total bill(USD)"). hue="sex", rectangular bars >>> g.set_xticklabels(rotation=45) Set the tick labels for x
set(xlim=(0,10),ylim=(0,100))) data=titanic, >>> g.set_axis_labels("Survived", Set the axis labels
Step 4 palette={"male":"g", "Sex")
>>> plt.title("title")
>>> plt.show(g) Step 5 "female":"m"}, >>> h.set(xlim=(0,5), Set the limit and ticks of the
markers=["^","o"], ylim=(0,5), x-and y-axis
linestyles=["-","--"]) xticks=[0,2.5,5],

1
Boxplot yticks=[0,2.5,5])
Data Also see Lists, NumPy & Pandas >>> sns.boxplot(x="alive", Boxplot
Plot
y="age",
>>> import pandas as pd hue="adult_male",
>>> import numpy as np >>> plt.title("A Title") Add plot title
data=titanic)
>>> uniform_data = np.random.rand(10, 12) >>> plt.ylabel("Survived") Adjust the label of the y-axis
>>> sns.boxplot(data=iris,orient="h") Boxplot with wide-form data
>>> data = pd.DataFrame({'x':np.arange(1,101), >>> plt.xlabel("Sex") Adjust the label of the x-axis
'y':np.random.normal(0,4,100)}) Violinplot >>> plt.ylim(0,100) Adjust the limits of the y-axis
>>> sns.violinplot(x="age", Violin plot >>> plt.xlim(0,10) Adjust the limits of the x-axis
Seaborn also offers built-in data sets: y="sex", >>> plt.setp(ax,yticks=[0,5]) Adjust a plot property
>>> titanic = sns.load_dataset("titanic") hue="survived", >>> plt.tight_layout() Adjust subplot params
>>> iris = sns.load_dataset("iris") data=titanic)

2 Figure Aesthetics Also see Matplotlib

5 Show or Save Plot Also see Matplotlib
>>> plt.show() Show the plot
Context Functions >>> plt.savefig("foo.png") Save the plot as a figure
>>> f, ax = plt.subplots(figsize=(5,6)) Create a figure and one subplot >>> plt.savefig("foo.png", Save transparent figure
>>> sns.set_context("talk") Set context to "talk" transparent=True)
>>> sns.set_context("notebook", Set context to "notebook",
Seaborn styles font_scale=1.5, Scale font elements and
>>> sns.set() (Re)set the seaborn default
rc={"lines.linewidth":2.5}) override param mapping Close & Clear Also see Matplotlib
>>> sns.set_style("whitegrid") Set the matplotlib parameters Color Palette >>> plt.cla() Clear an axis
>>> sns.set_style("ticks", Set the matplotlib parameters >>> plt.clf() Clear an entire figure
{"xtick.major.size":8, >>> sns.set_palette("husl",3) Define the color palette >>> plt.close() Close a window
"ytick.major.size":8}) >>> sns.color_palette("husl") Use with with to temporarily set palette
>>> sns.axes_style("whitegrid") Return a dict of params or use with >>> flatui = ["#9b59b6","#3498db","#95a5a6","#e74c3c","#34495e","#2ecc71"]
with to temporarily set the style >>> sns.set_palette(flatui) Set your own color palette DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Plot Anatomy & Workflow
Plot Anatomy Workflow
Matplotlib Axes/Subplot The basic steps to creating plots with matplotlib are:
Learn Python Interactively at www.DataCamp.com 1 Prepare data 2 Create plot 3 Plot 4 Customize plot 5 Save plot 6 Show plot
>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4] Step 1
>>> y = [10,20,25,30]
>>> fig = plt.figure() Step 2
Matplotlib Y-axis Figure >>> ax = fig.add_subplot(111) Step 3
>>> ax.plot(x, y, color='lightblue', linewidth=3) Step 3, 4
Matplotlib is a Python 2D plotting library which produces >>> ax.scatter([2,4,6],
publication-quality figures in a variety of hardcopy formats [5,15,25],
color='darkgreen',
and interactive environments across marker='^')
platforms. >>> ax.set_xlim(1, 6.5)
X-axis
>>> plt.savefig('foo.png')

1 Prepare The Data Also see Lists & NumPy

>>> plt.show() Step 6

1D Data 4 Customize Plot

>>> import numpy as np Colors, Color Bars & Color Maps Mathtext
>>> x = np.linspace(0, 10, 100)
>>> y = np.cos(x) >>> plt.plot(x, x, x, x**2, x, x**3) >>> plt.title(r'$sigma_i=15$', fontsize=20)
>>> z = np.sin(x) >>> ax.plot(x, y, alpha = 0.4)
>>> ax.plot(x, y, c='k') Limits, Legends & Layouts
2D Data or Images >>> fig.colorbar(im, orientation='horizontal')
>>> im = ax.imshow(img, Limits & Autoscaling
>>> data = 2 * np.random.random((10, 10)) cmap='seismic')
>>> data2 = 3 * np.random.random((10, 10)) >>> ax.margins(x=0.0,y=0.1) Add padding to a plot
>>> Y, X = np.mgrid[-3:3:100j, -3:3:100j] >>> ax.axis('equal') Set the aspect ratio of the plot to 1
Markers >>> ax.set(xlim=[0,10.5],ylim=[-1.5,1.5]) Set limits for x-and y-axis
>>> U = -1 - X**2 + Y
>>> V = 1 + X - Y**2 >>> fig, ax = plt.subplots() >>> ax.set_xlim(0,10.5) Set limits for x-axis
>>> from matplotlib.cbook import get_sample_data >>> ax.scatter(x,y,marker=".") Legends
>>> img = np.load(get_sample_data('axes_grid/bivariate_normal.npy')) >>> ax.plot(x,y,marker="o") >>> ax.set(title='An Example Axes', Set a title and x-and y-axis labels
ylabel='Y-Axis',
Linestyles xlabel='X-Axis')
2 Create Plot >>>
>>>
plt.plot(x,y,linewidth=4.0)
plt.plot(x,y,ls='solid')
>>> ax.legend(loc='best')
Ticks
No overlapping plot elements

>>> import matplotlib.pyplot as plt >>> ax.xaxis.set(ticks=range(1,5), Manually set x-ticks

>>> plt.plot(x,y,ls='--') ticklabels=[3,100,-12,"foo"])
Figure >>> plt.plot(x,y,'--',x**2,y**2,'-.') >>> ax.tick_params(axis='y', Make y-ticks longer and go in and out
>>> plt.setp(lines,color='r',linewidth=4.0) direction='inout',
>>> fig = plt.figure() length=10)
>>> fig2 = plt.figure(figsize=plt.figaspect(2.0)) Text & Annotations
Subplot Spacing
Axes >>> ax.text(1, >>> fig3.subplots_adjust(wspace=0.5, Adjust the spacing between subplots
-2.1, hspace=0.3,
All plotting is done with respect to an Axes. In most cases, a 'Example Graph', left=0.125,
style='italic') right=0.9,
subplot will fit your needs. A subplot is an axes on a grid system. >>> ax.annotate("Sine", top=0.9,
>>> fig.add_axes() xy=(8, 0), bottom=0.1)
>>> ax1 = fig.add_subplot(221) # row-col-num xycoords='data', >>> fig.tight_layout() Fit subplot(s) in to the figure area
xytext=(10.5, 0),
>>> ax3 = fig.add_subplot(212) textcoords='data', Axis Spines
>>> fig3, axes = plt.subplots(nrows=2,ncols=2) arrowprops=dict(arrowstyle="->", >>> ax1.spines['top'].set_visible(False) Make the top axis line for a plot invisible
>>> fig4, axes2 = plt.subplots(ncols=3) connectionstyle="arc3"),) >>> ax1.spines['bottom'].set_position(('outward',10)) Move the bottom axis line outward

3 Plotting Routines 5 Save Plot

Save figures
1D Data Vector Fields >>> plt.savefig('foo.png')
>>> lines = ax.plot(x,y) Draw points with lines or markers connecting them >>> axes[0,1].arrow(0,0,0.5,0.5) Add an arrow to the axes Save transparent figures
>>> ax.scatter(x,y) Draw unconnected points, scaled or colored >>> axes[1,1].quiver(y,z) Plot a 2D field of arrows >>> plt.savefig('foo.png', transparent=True)
>>> axes[0,0].bar([1,2,3],[3,4,5]) Plot vertical rectangles (constant width) >>> axes[0,1].streamplot(X,Y,U,V) Plot 2D vector fields
>>> axes[1,0].barh([0.5,1,2.5],[0,1,2])
6
Plot horiontal rectangles (constant height)
>>> axes[1,1].axhline(0.45) Draw a horizontal line across axes Data Distributions Show Plot
>>> axes[0,1].axvline(0.65) Draw a vertical line across axes >>> ax1.hist(y) Plot a histogram
>>> ax.fill(x,y,color='blue') Draw filled polygons >>> ax3.boxplot(y) Make a box and whisker plot >>> plt.show()
>>> ax.fill_between(x,y,color='yellow') Fill between y-values and 0 >>> ax3.violinplot(z) Make a violin plot
2D Data or Images Close & Clear
>>> fig, ax = plt.subplots() >>> plt.cla() Clear an axis
>>> axes2[0].pcolor(data2) Pseudocolor plot of 2D array >>> plt.clf() Clear the entire figure
>>> im = ax.imshow(img, Colormapped or RGB arrays >>> axes2[0].pcolormesh(data) Pseudocolor plot of 2D array
cmap='gist_earth', >>> plt.close() Close a window
interpolation='nearest', >>> CS = plt.contour(Y,X,U) Plot contours
vmin=-2, >>> axes2[2].contourf(data1) Plot filled contours
vmax=2) >>> axes2[2]= ax.clabel(CS) Label a contour plot DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Create Your Model Evaluate Your Model’s Performance
Supervised Learning Estimators Classification Metrics
Scikit-Learn
Learn Python for data science Interactively at www.DataCamp.com Linear Regression Accuracy Score
>>> from sklearn.linear_model import LinearRegression >>> knn.score(X_test, y_test) Estimator score method
>>> lr = LinearRegression(normalize=True) >>> from sklearn.metrics import accuracy_score Metric scoring functions
>>> accuracy_score(y_test, y_pred)
Support Vector Machines (SVM)
Scikit-learn >>> from sklearn.svm import SVC Classification Report
>>> svc = SVC(kernel='linear') >>> from sklearn.metrics import classification_report Precision, recall, f1-score
Scikit-learn is an open source Python library that Naive Bayes >>> print(classification_report(y_test, y_pred)) and support
implements a range of machine learning, >>> from sklearn.naive_bayes import GaussianNB Confusion Matrix
>>> gnb = GaussianNB() >>> from sklearn.metrics import confusion_matrix
preprocessing, cross-validation and visualization >>> print(confusion_matrix(y_test, y_pred))
algorithms using a unified interface. KNN
>>> from sklearn import neighbors Regression Metrics
A Basic Example >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5)
>>> from sklearn import neighbors, datasets, preprocessing
Mean Absolute Error
>>> from sklearn.model_selection import train_test_split Unsupervised Learning Estimators >>> from sklearn.metrics import mean_absolute_error
>>> from sklearn.metrics import accuracy_score >>> y_true = [3, -0.5, 2]
>>> iris = datasets.load_iris() Principal Component Analysis (PCA) >>> mean_absolute_error(y_true, y_pred)
>>> X, y = iris.data[:, :2], iris.target >>> from sklearn.decomposition import PCA Mean Squared Error
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) >>> pca = PCA(n_components=0.95) >>> from sklearn.metrics import mean_squared_error
>>> scaler = preprocessing.StandardScaler().fit(X_train) >>> mean_squared_error(y_test, y_pred)
>>> X_train = scaler.transform(X_train)
K Means
>>> X_test = scaler.transform(X_test) >>> from sklearn.cluster import KMeans R² Score
>>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from sklearn.metrics import r2_score
>>> r2_score(y_true, y_pred)
>>> knn.fit(X_train, y_train)
>>> y_pred = knn.predict(X_test)
>>> accuracy_score(y_test, y_pred) Model Fitting Clustering Metrics
Adjusted Rand Index
Supervised learning >>> from sklearn.metrics import adjusted_rand_score
Loading The Data Also see NumPy & Pandas >>> lr.fit(X, y) Fit the model to the data
>>> adjusted_rand_score(y_true, y_pred)
>>> knn.fit(X_train, y_train)
Your data needs to be numeric and stored as NumPy arrays or SciPy sparse >>> svc.fit(X_train, y_train) Homogeneity
>>> from sklearn.metrics import homogeneity_score
matrices. Other types that are convertible to numeric arrays, such as Pandas Unsupervised Learning >>> homogeneity_score(y_true, y_pred)
DataFrame, are also acceptable. >>> k_means.fit(X_train) Fit the model to the data
>>> pca_model = pca.fit_transform(X_train) Fit to data, then transform it V-measure
>>> import numpy as np >>> from sklearn.metrics import v_measure_score
>>> X = np.random.random((10,5)) >>> metrics.v_measure_score(y_true, y_pred)
>>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>> X[X < 0.7] = 0 Prediction Cross-Validation
>>> from sklearn.cross_validation import cross_val_score
Supervised Estimators >>> print(cross_val_score(knn, X_train, y_train, cv=4))
Training And Test Data >>> y_pred = svc.predict(np.random.random((2,5))) Predict labels
>>> y_pred = lr.predict(X_test)
>>> print(cross_val_score(lr, X, y, cv=2))
Predict labels
>>> from sklearn.model_selection import train_test_split >>> y_pred = knn.predict_proba(X_test) Estimate probability of a label
>>> X_train, X_test, y_train, y_test = train_test_split(X,
y, Unsupervised Estimators Tune Your Model
random_state=0) >>> y_pred = k_means.predict(X_test) Predict labels in clustering algos Grid Search
>>> from sklearn.grid_search import GridSearchCV
>>> params = {"n_neighbors": np.arange(1,3),
Preprocessing The Data "metric": ["euclidean", "cityblock"]}
>>> grid = GridSearchCV(estimator=knn,
Standardization Encoding Categorical Features param_grid=params)
>>> grid.fit(X_train, y_train)
>>> from sklearn.preprocessing import StandardScaler >>> from sklearn.preprocessing import LabelEncoder >>> print(grid.best_score_)
>>> scaler = StandardScaler().fit(X_train) >>> print(grid.best_estimator_.n_neighbors)
>>> enc = LabelEncoder()
>>> standardized_X = scaler.transform(X_train) >>> y = enc.fit_transform(y)
>>> standardized_X_test = scaler.transform(X_test) Randomized Parameter Optimization
Normalization Imputing Missing Values >>> from sklearn.grid_search import RandomizedSearchCV
>>> params = {"n_neighbors": range(1,5),
>>> from sklearn.preprocessing import Normalizer "weights": ["uniform", "distance"]}
>>> from sklearn.preprocessing import Imputer >>> rsearch = RandomizedSearchCV(estimator=knn,
>>> scaler = Normalizer().fit(X_train) >>> imp = Imputer(missing_values=0, strategy='mean', axis=0) param_distributions=params,
>>> normalized_X = scaler.transform(X_train) >>> imp.fit_transform(X_train) cv=4,
>>> normalized_X_test = scaler.transform(X_test) n_iter=8,
random_state=5)
Binarization Generating Polynomial Features >>> rsearch.fit(X_train, y_train)
>>> print(rsearch.best_score_)
>>> from sklearn.preprocessing import Binarizer >>> from sklearn.preprocessing import PolynomialFeatures
>>> binarizer = Binarizer(threshold=0.0).fit(X) >>> poly = PolynomialFeatures(5)
>>> binary_X = binarizer.transform(X) >>> poly.fit_transform(X) DataCamp
Learn Python for Data Science Interactively

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Python Interview Book
No ratings yet
Python Interview Book
122 pages
Internal Combustion Engine Fundamentals 2nd Edition
94% (17)
Internal Combustion Engine Fundamentals 2nd Edition
426 pages
Cse341 Assignment1 G6
No ratings yet
Cse341 Assignment1 G6
33 pages
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
No ratings yet
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
5 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
Pandas
No ratings yet
Pandas
13 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
384 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
No ratings yet
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
50 pages
Pandas
No ratings yet
Pandas
4 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Python Workbook For Absolute Beginners (Part 1) (BooksRack - Net)
No ratings yet
Python Workbook For Absolute Beginners (Part 1) (BooksRack - Net)
55 pages
Fds Unit - III
No ratings yet
Fds Unit - III
58 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
Programmation Météo en Python
No ratings yet
Programmation Météo en Python
50 pages
9.3.2 CDO Operator Categories: Section 5.3
No ratings yet
9.3.2 CDO Operator Categories: Section 5.3
50 pages
Unit 5
No ratings yet
Unit 5
27 pages
100 Pandas Exercises
No ratings yet
100 Pandas Exercises
6 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
List Comprehension in Python
No ratings yet
List Comprehension in Python
8 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Manual of Python
No ratings yet
Manual of Python
43 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
44 pages
Decoupling2 WP
No ratings yet
Decoupling2 WP
16 pages
Python
0% (1)
Python
67 pages
Introduction To Data Visualization in Python
No ratings yet
Introduction To Data Visualization in Python
16 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Sqlalchemy 0 7 3
No ratings yet
Sqlalchemy 0 7 3
540 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
25+ Python Challenging Programming Exercises
No ratings yet
25+ Python Challenging Programming Exercises
24 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
3.menus and Toolbars in WxPython
No ratings yet
3.menus and Toolbars in WxPython
13 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Python Iterators
No ratings yet
Python Iterators
9 pages
Natural Language Processing With Python & NLTK Cheat Sheet: by Via
No ratings yet
Natural Language Processing With Python & NLTK Cheat Sheet: by Via
2 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
No ratings yet
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
42 pages
GSIUserGuide v3.1
No ratings yet
GSIUserGuide v3.1
180 pages
CheatSheet Python 3 Complex Data Types
No ratings yet
CheatSheet Python 3 Complex Data Types
1 page
Python Date Time
No ratings yet
Python Date Time
6 pages
Python For Beginners Cheat Sheet
No ratings yet
Python For Beginners Cheat Sheet
2 pages
Acceleo User Guide
No ratings yet
Acceleo User Guide
56 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
Data Visualization - Getting Started With Plotly
No ratings yet
Data Visualization - Getting Started With Plotly
37 pages
PYTHON高级编程
No ratings yet
PYTHON高级编程
421 pages
Flask Restplus
No ratings yet
Flask Restplus
86 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
Attention As An RNN: Preprint. Under Review
No ratings yet
Attention As An RNN: Preprint. Under Review
18 pages
YOLOV10 Explained
No ratings yet
YOLOV10 Explained
13 pages
Sales - Project - v3 (2) .Ipynb
No ratings yet
Sales - Project - v3 (2) .Ipynb
1,230 pages
Swipe
No ratings yet
Swipe
18 pages
Dietary Guidelines For Pregnant Woman With Glucose Intolerance
No ratings yet
Dietary Guidelines For Pregnant Woman With Glucose Intolerance
4 pages
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
Mysql Assignment 1
No ratings yet
Mysql Assignment 1
2 pages
Astm A278 A278m
No ratings yet
Astm A278 A278m
4 pages
DT-10 Owner's Manual: Turning On The Power
No ratings yet
DT-10 Owner's Manual: Turning On The Power
3 pages
07a40105-Fluid Mechanics and Hydraulic Machinery
No ratings yet
07a40105-Fluid Mechanics and Hydraulic Machinery
8 pages
Journal of Banking & Finance: Nilanjan Basu, Imants Paeglis, Mohammad Rahnamaei
No ratings yet
Journal of Banking & Finance: Nilanjan Basu, Imants Paeglis, Mohammad Rahnamaei
13 pages
Quickassist Adapter 8950 Brief
No ratings yet
Quickassist Adapter 8950 Brief
3 pages
Numerical Analysis: Lecture-5
No ratings yet
Numerical Analysis: Lecture-5
8 pages
Unit-1 - Introduction To Nodejs
No ratings yet
Unit-1 - Introduction To Nodejs
92 pages
Fluid Statics and Fluid Dynamics General Physics 1
No ratings yet
Fluid Statics and Fluid Dynamics General Physics 1
41 pages
Setup and Configuration For OpenSSH
No ratings yet
Setup and Configuration For OpenSSH
13 pages
Recent Advances and Application of Machine Learning in Food Flavor Prediction and Regulation
No ratings yet
Recent Advances and Application of Machine Learning in Food Flavor Prediction and Regulation
14 pages
B10 AutoCAD 201222
No ratings yet
B10 AutoCAD 201222
2 pages
Physical Chemistr y Inorganic Chemistr y Organic Chemistr Y: Class XII
No ratings yet
Physical Chemistr y Inorganic Chemistr y Organic Chemistr Y: Class XII
1 page
Philosophy, Scientific Knowledge, and Concept Formation in Guelincx and Descartes
No ratings yet
Philosophy, Scientific Knowledge, and Concept Formation in Guelincx and Descartes
460 pages
Roadmap To
No ratings yet
Roadmap To
58 pages
Welding Consumable
No ratings yet
Welding Consumable
82 pages
CAPA Test 1 2014 Regular
No ratings yet
CAPA Test 1 2014 Regular
3 pages
Your Paper: You January 3, 2025
No ratings yet
Your Paper: You January 3, 2025
3 pages
Modelling and Mapping of Water Erosion Risk by Application of The GIS and PAP/RAC Guidelines in The Watershed of Oued Zgane (Middle Atlas Morocco)
No ratings yet
Modelling and Mapping of Water Erosion Risk by Application of The GIS and PAP/RAC Guidelines in The Watershed of Oued Zgane (Middle Atlas Morocco)
17 pages
Fortnightly Test Series 2023 24 - RM (P1) Test 01A
No ratings yet
Fortnightly Test Series 2023 24 - RM (P1) Test 01A
20 pages
PM-0.5 MK: - Reference Manual
No ratings yet
PM-0.5 MK: - Reference Manual
7 pages
R19 Cse (Iot)
No ratings yet
R19 Cse (Iot)
368 pages
NAK Catalogue-Product Overview
No ratings yet
NAK Catalogue-Product Overview
24 pages
Modified Compressed Air Engine Two Stroke Engine Working On The Design of A Four Stroke Petrol Engine
No ratings yet
Modified Compressed Air Engine Two Stroke Engine Working On The Design of A Four Stroke Petrol Engine
3 pages
Experiment 6 - Sem2
No ratings yet
Experiment 6 - Sem2
4 pages
MTP Responder Development Guide
No ratings yet
MTP Responder Development Guide
34 pages
NGR Installation Manual PDF
No ratings yet
NGR Installation Manual PDF
15 pages

Python Libraries Cheat Sheets

Uploaded by

Python Libraries Cheat Sheets

Uploaded by

Tidy Data – A foundation for wrangling in pandas

# of distinct values in a column. x1 x2 x3 pd.merge(adf, bdf,

>>> np.subtract(a,b) Boolean Indexing

>>> a = np.array([1,2,3]) [ 4. , 10. , 18. ]])

>>> np.string_ Fixed-length string type >>> a.sort() Sort an array

Categorical Plots Regression Plots

2 Figure Aesthetics Also see Matplotlib

1 Prepare The Data Also see Lists & NumPy

1D Data 4 Customize Plot

>>> import matplotlib.pyplot as plt >>> ax.xaxis.set(ticks=range(1,5), Manually set x-ticks

3 Plotting Routines 5 Save Plot

You might also like