97% found this document useful (33 votes)

11K views11 pages

Python Data Science Cheat Sheet

This document provides a summary of key Python concepts for data science, including: 1) Python basics like variables, data types, calculations and type conversions are introduced. Commonly used data types like lists, strings and NumPy arrays are also summarized. 2) Common list operations such as accessing elements, slicing, concatenation and list methods are described. Similar NumPy array operations and functions are also outlined. 3) Popular Python libraries for data analysis, machine learning and scientific computing like NumPy, Pandas and Matplotlib are highlighted. Resources for learning Python for data science interactively are also provided.

Uploaded by

Andrew Khalatov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

97% found this document useful (33 votes)

11K views11 pages

Python Data Science Cheat Sheet

Uploaded by

Andrew Khalatov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Python For Data Science Cheat Sheet Lists Also see NumPy Arrays Libraries

>>> a = 'is' Import libraries

Python Basics >>> b = 'nice' >>> import numpy Data analysis Machine learning
Learn More Python for Data Science Interactively at [Link] >>> my_list = ['my', 'list', a, b] >>> import numpy as np
>>> my_list2 = [[4,5,6,7], [3,4,5,6]] Selective import
>>> from math import pi Scientific computing 2D plotting
Variables and Data Types Selecting List Elements Index starts at 0
Subset Install Python
Variable Assignment
>>> my_list[1] Select item at index 1
>>> x=5
>>> my_list[-3] Select 3rd last item
>>> x
Slice
5 >>> my_list[1:3] Select items at index 1 and 2
Calculations With Variables >>> my_list[1:] Select items after index 0
>>> my_list[:3] Select items before index 3 Leading open data science platform Free IDE that is included Create and share
>>> x+2 Sum of two variables
>>> my_list[:] Copy my_list powered by Python with Anaconda documents with live code,
7 visualizations, text, ...
>>> x-2 Subtraction of two variables
Subset Lists of Lists
>>> my_list2[1][0] my_list[list][itemOfList]
3
>>> my_list2[1][:2] Numpy Arrays Also see Lists
>>> x*2 Multiplication of two variables
>>> my_list = [1, 2, 3, 4]
10 List Operations >>> my_array = [Link](my_list)
>>> x**2 Exponentiation of a variable
25 >>> my_list + my_list >>> my_2darray = [Link]([[1,2,3],[4,5,6]])
>>> x%2 Remainder of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice']
Selecting Numpy Array Elements Index starts at 0
1 >>> my_list * 2
>>> x/float(2) Division of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] Subset
2.5 >>> my_list2 > 4 >>> my_array[1] Select item at index 1
True 2
Types and Type Conversion Slice
List Methods >>> my_array[0:2] Select items at index 0 and 1
str() '5', '3.45', 'True' Variables to strings
my_list.index(a) Get the index of an item array([1, 2])
>>>
int() 5, 3, 1 Variables to integers >>> my_list.count(a) Count an item Subset 2D Numpy arrays
>>> my_list.append('!') Append an item at a time >>> my_2darray[:,0] my_2darray[rows, columns]
my_list.remove('!') Remove an item array([1, 4])
float() 5.0, 1.0 Variables to floats >>>
>>> del(my_list[0:1]) Remove an item Numpy Array Operations
bool() True, True, True >>> my_list.reverse() Reverse the list
Variables to booleans >>> my_array > 3
>>> my_list.extend('!') Append an item array([False, False, False, True], dtype=bool)
>>> my_list.pop(-1) Remove an item >>> my_array * 2
Asking For Help >>> my_list.insert(0,'!') Insert an item array([2, 4, 6, 8])
>>> help(str) >>> my_list.sort() Sort the list >>> my_array + [Link]([5, 6, 7, 8])
array([6, 8, 10, 12])
Strings
>>> my_string = 'thisStringIsAwesome' Numpy Array Functions
String Operations Index starts at 0
>>> my_string >>> my_array.shape Get the dimensions of the array
'thisStringIsAwesome' >>> my_string[3] >>> [Link](other_array) Append items to an array
>>> my_string[4:9] >>> [Link](my_array, 1, 5) Insert items in an array
String Operations >>> [Link](my_array,[1]) Delete items in an array
String Methods >>> [Link](my_array) Mean of the array
>>> my_string * 2
'thisStringIsAwesomethisStringIsAwesome' >>> my_string.upper() String to uppercase >>> [Link](my_array) Median of the array
>>> my_string + 'Innit' >>> my_string.lower() String to lowercase >>> my_array.corrcoef() Correlation coefficient
'thisStringIsAwesomeInnit' >>> my_string.count('w') Count String elements >>> [Link](my_array) Standard deviation
>>> 'm' in my_string >>> my_string.replace('e', 'i') Replace String elements
True >>> my_string.strip() Strip whitespaces DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Advanced Indexing Also see NumPy Arrays Combining Data
Selecting data1 data2
Pandas >>> [Link][:,(df3>1).any()] Select cols with any vals >1 X1 X2 X1 X3
Learn Python for Data Science Interactively at [Link] >>> [Link][:,(df3>1).all()] Select cols with vals > 1
>>> [Link][:,[Link]().any()] Select cols with NaN a 11.432 a 20.784
>>> [Link][:,[Link]().all()] Select cols without NaN b 1.303 b NaN
Indexing With isin c 99.906 d 20.784
>>> df[([Link]([Link]))] Find same elements
Reshaping Data >>> [Link](items=”a”,”b”]) Filter on values
Merge
>>> [Link](lambda x: not x%5) Select specific elements
Pivot Where X1 X2 X3
>>> [Link](data1,
>>> df3= [Link](index='Date', Spread rows into columns >>> [Link](s > 0) Subset the data data2, a 11.432 20.784
columns='Type', Query how='left',
values='Value') b 1.303 NaN
>>> [Link]('second > first') Query DataFrame on='X1')
c 99.906 NaN
Date Type Value

0 2016-03-01 a 11.432 Type a b c Setting/Resetting Index >>> [Link](data1, X1 X2 X3

1 2016-03-02 b 13.031 Date data2, a 11.432 20.784
>>> df.set_index('Country') Set the index
how='right',
2 2016-03-01 c 20.784 2016-03-01 11.432 NaN 20.784 >>> df4 = df.reset_index() Reset the index b 1.303 NaN
on='X1')
3 2016-03-03 a 99.906 >>> df = [Link](index=str, Rename DataFrame d NaN 20.784
2016-03-02 1.303 13.031 NaN columns={"Country":"cntry",
4 2016-03-02 a 1.303 "Capital":"cptl", >>> [Link](data1,
2016-03-03 99.906 NaN 20.784 "Population":"ppltn"}) X1 X2 X3
5 2016-03-03 c 20.784 data2,
how='inner', a 11.432 20.784
Pivot Table Reindexing on='X1') b 1.303 NaN
>>> s2 = [Link](['a','c','d','e','b'])
>>> df4 = pd.pivot_table(df2, Spread rows into columns X1 X2 X3
values='Value', Forward Filling Backward Filling >>> [Link](data1,
index='Date', data2, a 11.432 20.784
columns='Type']) >>> [Link](range(4), >>> s3 = [Link](range(5), how='outer', b 1.303 NaN
method='ffill') method='bfill') on='X1') c 99.906 NaN
Stack / Unstack Country Capital Population 0 3
0 Belgium Brussels 11190846 1 3 d NaN 20.784
>>> stacked = [Link]() Pivot a level of column labels 1 India New Delhi 1303171035 2 3
>>> [Link]() Pivot a level of index labels 2 Brazil Brasília 207847528 3 3 Join
3 Brazil Brasília 207847528 4 3
0 1 1 5 0 0.233482 >>> [Link](data2, how='right')
1 5 0.233482 0.390959 1 0.390959 MultiIndexing Concatenate
2 4 0.184713 0.237102 2 4 0 0.184713
>>> arrays = [[Link]([1,2,3]),
3 3 0.433522 0.429401 1 0.237102 [Link]([5,4,3])] Vertical
>>> df5 = [Link]([Link](3, 2), index=arrays) >>> [Link](s2)
Unstacked 3 3 0 0.433522
>>> tuples = list(zip(*arrays)) Horizontal/Vertical
1 0.429401 >>> index = [Link].from_tuples(tuples, >>> [Link]([s,s2],axis=1, keys=['One','Two'])
Stacked names=['first', 'second']) >>> [Link]([data1, data2], axis=1, join='inner')
>>> df6 = [Link]([Link](3, 2), index=index)
Melt >>> df2.set_index(["Date", "Type"])
>>> [Link](df2, Gather columns into rows
Dates
id_vars=["Date"],
value_vars=["Type", "Value"],
Duplicate Data >>> df2['Date']= pd.to_datetime(df2['Date'])
>>> df2['Date']= pd.date_range('2000-1-1',
value_name="Observations") >>> [Link]() Return unique values periods=6,
>>> [Link]('Type') Check duplicates freq='M')
Date Type Value
Date Variable Observations >>> dates = [datetime(2012,5,1), datetime(2012,5,2)]
0 2016-03-01 Type a >>> df2.drop_duplicates('Type', keep='last') Drop duplicates >>> index = [Link](dates)
0 2016-03-01 a 11.432 1 2016-03-02 Type b
>>> [Link]() Check index duplicates >>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')
1 2016-03-02 b 13.031 2 2016-03-01 Type c
2 2016-03-01 c 20.784 3 2016-03-03 Type a Grouping Data Visualization Also see Matplotlib
4 2016-03-02 Type a
3 2016-03-03 a 99.906
5 2016-03-03 Type c Aggregation >>> import [Link] as plt
4 2016-03-02 a 1.303 >>> [Link](by=['Date','Type']).mean()
6 2016-03-01 Value 11.432 >>> [Link]() >>> [Link]()
>>> [Link](level=0).sum()
5 2016-03-03 c 20.784 7 2016-03-02 Value 13.031 >>> [Link](level=0).agg({'a':lambda x:sum(x)/len(x), >>> [Link]() >>> [Link]()
8 2016-03-01 Value 20.784 'b': [Link]})
9 2016-03-03 Value 99.906 Transformation
>>> customSum = lambda x: (x+x%2)
10 2016-03-02 Value 1.303
>>> [Link](level=0).transform(customSum)
11 2016-03-03 Value 20.784

Iteration Missing Data

>>> [Link]() Drop NaN values
>>> [Link]() (Column-index, Series) pairs >>> [Link]([Link]()) Fill NaN values with a predetermined value
>>> [Link]() (Row-index, Series) pairs >>> [Link]("a", "f") Replace values with others
DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Inspecting Your Array Subsetting, Slicing, Indexing Also see Lists
>>> [Link] Array dimensions Subsetting
NumPy Basics >>>
>>>
len(a)
[Link]
Length of array
Number of array dimensions
>>> a[2]
3
1 2 3 Select the element at the 2nd index
Learn Python for Data Science Interactively at [Link] >>> [Link] Number of array elements >>> b[1,2] 1.5 2 3 Select the element at row 0 column 2
>>> [Link] Data type of array elements 6.0 4 5 6 (equivalent to b[1][2])
>>> [Link] Name of data type
>>> [Link](int) Convert an array to a different type Slicing
NumPy >>> a[0:2]
array([1, 2])
1 2 3 Select items at index 0 and 1
2
The NumPy library is the core library for scientific computing in Asking For Help >>> b[0:2,1] 1.5 2 3 Select items at rows 0 and 1 in column 1
>>> [Link]([Link]) array([ 2., 5.]) 4 5 6
Python. It provides a high-performance multidimensional array
Array Mathematics
1.5 2 3
>>> b[:1] Select all items at row 0
object, and tools for working with these arrays. array([[1.5, 2., 3.]]) 4 5 6 (equivalent to b[0:1, :])
Arithmetic Operations >>> c[1,...] Same as [1,:,:]
Use the following import convention: array([[[ 3., 2., 1.],
>>> import numpy as np [ 4., 5., 6.]]])
>>> g = a - b Subtraction
array([[-0.5, 0. , 0. ], >>> a[ : :-1] Reversed array a
NumPy Arrays [-3. , -3. , -3. ]])
array([3, 2, 1])

>>> [Link](a,b) Boolean Indexing

1D array 2D array 3D array Subtraction
>>> a[a<2] Select elements from a less than 2
>>> b + a Addition 1 2 3
array([[ 2.5, 4. , 6. ], array([1])
axis 1 axis 2
1 2 3 axis 1 [ 5. , 7. , 9. ]]) Fancy Indexing
1.5 2 3 >>> [Link](b,a) Addition >>> b[[1, 0, 1, 0],[0, 1, 2, 0]] Select elements (1,0),(0,1),(1,2) and (0,0)
axis 0 axis 0 array([ 4. , 2. , 6. , 1.5])
4 5 6 >>> a / b Division
array([[ 0.66666667, 1. , 1. ], >>> b[[1, 0, 1, 0]][:,[0,1,2,0]] Select a subset of the matrix’s rows
[ 0.25 , 0.4 , 0.5 ]]) array([[ 4. ,5. , 6. , 4. ], and columns
>>> [Link](a,b) Division [ 1.5, 2. , 3. , 1.5],
Creating Arrays >>> a * b
array([[ 1.5, 4. , 9. ],
Multiplication
[ 4. , 5.
[ 1.5, 2.
,
,
6.
3.
,
,
4. ],
1.5]])

>>> a = [Link]([1,2,3]) [ 4. , 10. , 18. ]])

>>> b = [Link]([(1.5,2,3), (4,5,6)], dtype = float) >>> [Link](a,b) Multiplication Array Manipulation
>>> c = [Link]([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], >>> [Link](b) Exponentiation
dtype = float) >>> [Link](b) Square root Transposing Array
>>> [Link](a) Print sines of an array >>> i = [Link](b) Permute array dimensions
Initial Placeholders >>> [Link](b) Element-wise cosine >>> i.T Permute array dimensions
>>> [Link](a) Element-wise natural logarithm
>>> [Link]((3,4)) Create an array of zeros >>> [Link](f) Dot product
Changing Array Shape
>>> [Link]((2,3,4),dtype=np.int16) Create an array of ones array([[ 7., 7.], >>> [Link]() Flatten the array
>>> d = [Link](10,25,5) Create an array of evenly [ 7., 7.]]) >>> [Link](3,-2) Reshape, but don’t change data
spaced values (step value)
>>> [Link](0,2,9) Create an array of evenly Comparison Adding/Removing Elements
spaced values (number of samples) >>> [Link]((2,6)) Return a new array with shape (2,6)
>>> e = [Link]((2,2),7) Create a constant array >>> a == b Element-wise comparison >>> [Link](h,g) Append items to an array
>>> f = [Link](2) Create a 2X2 identity matrix array([[False, True, True], >>> [Link](a, 1, 5) Insert items in an array
>>> [Link]((2,2)) Create an array with random values [False, False, False]], dtype=bool) >>> [Link](a,[1]) Delete items from an array
>>> [Link]((3,2)) Create an empty array >>> a < 2 Element-wise comparison
array([True, False, False], dtype=bool) Combining Arrays
>>> np.array_equal(a, b) Array-wise comparison >>> [Link]((a,d),axis=0) Concatenate arrays
I/O array([ 1, 2,
>>> [Link]((a,b))
3, 10, 15, 20])
Stack arrays vertically (row-wise)
Aggregate Functions array([[ 1. , 2. , 3. ],
Saving & Loading On Disk [ 1.5, 2. , 3. ],
>>> [Link]() Array-wise sum [ 4. , 5. , 6. ]])
>>> [Link]('my_array', a) >>> [Link]() Array-wise minimum value >>> np.r_[e,f] Stack arrays vertically (row-wise)
>>> [Link]('[Link]', a, b) >>> [Link](axis=0) Maximum value of an array row >>> [Link]((e,f)) Stack arrays horizontally (column-wise)
>>> [Link]('my_array.npy') >>> [Link](axis=1) Cumulative sum of the elements array([[ 7., 7., 1., 0.],
>>> [Link]() Mean [ 7., 7., 0., 1.]])
Saving & Loading Text Files >>> [Link]() Median >>> np.column_stack((a,d)) Create stacked column-wise arrays
>>> [Link]("[Link]") >>> [Link]() Correlation coefficient array([[ 1, 10],
>>> [Link](b) Standard deviation [ 2, 15],
>>> [Link]("my_file.csv", delimiter=',') [ 3, 20]])
>>> [Link]("[Link]", a, delimiter=" ") >>> np.c_[a,d] Create stacked column-wise arrays
Copying Arrays Splitting Arrays
Data Types >>> h = [Link]() Create a view of the array with the same data >>> [Link](a,3) Split the array horizontally at the 3rd
>>> [Link](a) Create a copy of the array [array([1]),array([2]),array([3])] index
>>> np.int64 Signed 64-bit integer types >>> [Link](c,2) Split the array vertically at the 2nd index
>>> np.float32 Standard double-precision floating point >>> h = [Link]() Create a deep copy of the array [array([[[ 1.5, 2. , 1. ],
>>> [Link] Complex numbers represented by 128 floats [ 4. , 5. , 6. ]]]),
array([[[ 3., 2., 3.],
>>>
>>>
[Link]
[Link]
Boolean type storing TRUE and FALSE values
Python object type Sorting Arrays [ 4., 5., 6.]]])]

>>> np.string_ Fixed-length string type >>> [Link]() Sort an array

>>> np.unicode_ Fixed-length unicode type >>> [Link](axis=0) Sort the elements of an array's axis DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Plot Anatomy & Workflow
Plot Anatomy Workflow
Matplotlib Axes/Subplot The basic steps to creating plots with matplotlib are:
Learn Python Interactively at [Link] 1 Prepare data 2 Create plot 3 Plot 4 Customize plot 5 Save plot 6 Show plot
>>> import [Link] as plt
>>> x = [1,2,3,4] Step 1
>>> y = [10,20,25,30]
>>> fig = [Link]() Step 2
Matplotlib Y-axis Figure >>> ax = fig.add_subplot(111) Step 3
>>> [Link](x, y, color='lightblue', linewidth=3) Step 3, 4
Matplotlib is a Python 2D plotting library which produces >>> [Link]([2,4,6],
publication-quality figures in a variety of hardcopy formats [5,15,25],
color='darkgreen',
and interactive environments across marker='^')
platforms. >>> ax.set_xlim(1, 6.5)
X-axis
>>> [Link]('[Link]')

1 Prepare The Data Also see Lists & NumPy

>>> [Link]() Step 6

1D Data 4 Customize Plot

>>> import numpy as np Colors, Color Bars & Color Maps Mathtext
>>> x = [Link](0, 10, 100)
>>> y = [Link](x) >>> [Link](x, x, x, x**2, x, x**3) >>> [Link](r'$sigma_i=15$', fontsize=20)
>>> z = [Link](x) >>> [Link](x, y, alpha = 0.4)
>>> [Link](x, y, c='k') Limits, Legends & Layouts
2D Data or Images >>> [Link](im, orientation='horizontal')
>>> im = [Link](img, Limits & Autoscaling
>>> data = 2 * [Link]((10, 10)) cmap='seismic')
>>> data2 = 3 * [Link]((10, 10)) >>> [Link](x=0.0,y=0.1) Add padding to a plot
>>> Y, X = [Link][-[Link]j, -[Link]j] >>> [Link]('equal') Set the aspect ratio of the plot to 1
Markers >>> [Link](xlim=[0,10.5],ylim=[-1.5,1.5]) Set limits for x-and y-axis
>>> U = -1 - X**2 + Y
>>> V = 1 + X - Y**2 >>> fig, ax = [Link]() >>> ax.set_xlim(0,10.5) Set limits for x-axis
>>> from [Link] import get_sample_data >>> [Link](x,y,marker=".") Legends
>>> img = [Link](get_sample_data('axes_grid/bivariate_normal.npy')) >>> [Link](x,y,marker="o") >>> [Link](title='An Example Axes', Set a title and x-and y-axis labels
ylabel='Y-Axis',
Linestyles xlabel='X-Axis')
2 Create Plot >>>
>>>
[Link](x,y,linewidth=4.0)
[Link](x,y,ls='solid')
>>> [Link](loc='best')
Ticks
No overlapping plot elements

>>> import [Link] as plt >>> [Link](ticks=range(1,5), Manually set x-ticks

>>> [Link](x,y,ls='--') ticklabels=[3,100,-12,"foo"])
Figure >>> [Link](x,y,'--',x**2,y**2,'-.') >>> ax.tick_params(axis='y', Make y-ticks longer and go in and out
>>> [Link](lines,color='r',linewidth=4.0) direction='inout',
>>> fig = [Link]() length=10)
>>> fig2 = [Link](figsize=[Link](2.0)) Text & Annotations
Subplot Spacing
Axes >>> [Link](1, >>> fig3.subplots_adjust(wspace=0.5, Adjust the spacing between subplots
-2.1, hspace=0.3,
All plotting is done with respect to an Axes. In most cases, a 'Example Graph', left=0.125,
style='italic') right=0.9,
subplot will fit your needs. A subplot is an axes on a grid system. >>> [Link]("Sine", top=0.9,
>>> fig.add_axes() xy=(8, 0), bottom=0.1)
>>> ax1 = fig.add_subplot(221) # row-col-num xycoords='data', >>> fig.tight_layout() Fit subplot(s) in to the figure area
xytext=(10.5, 0),
>>> ax3 = fig.add_subplot(212) textcoords='data', Axis Spines
>>> fig3, axes = [Link](nrows=2,ncols=2) arrowprops=dict(arrowstyle="->", >>> [Link]['top'].set_visible(False) Make the top axis line for a plot invisible
>>> fig4, axes2 = [Link](ncols=3) connectionstyle="arc3"),) >>> [Link]['bottom'].set_position(('outward',10)) Move the bottom axis line outward

3 Plotting Routines 5 Save Plot

Save figures
1D Data Vector Fields >>> [Link]('[Link]')
>>> lines = [Link](x,y) Draw points with lines or markers connecting them >>> axes[0,1].arrow(0,0,0.5,0.5) Add an arrow to the axes Save transparent figures
>>> [Link](x,y) Draw unconnected points, scaled or colored >>> axes[1,1].quiver(y,z) Plot a 2D field of arrows >>> [Link]('[Link]', transparent=True)
>>> axes[0,0].bar([1,2,3],[3,4,5]) Plot vertical rectangles (constant width) >>> axes[0,1].streamplot(X,Y,U,V) Plot 2D vector fields
>>> axes[1,0].barh([0.5,1,2.5],[0,1,2])
6
Plot horiontal rectangles (constant height)
>>> axes[1,1].axhline(0.45) Draw a horizontal line across axes Data Distributions Show Plot
>>> axes[0,1].axvline(0.65) Draw a vertical line across axes >>> [Link](y) Plot a histogram
>>> [Link](x,y,color='blue') Draw filled polygons >>> [Link](y) Make a box and whisker plot >>> [Link]()
>>> ax.fill_between(x,y,color='yellow') Fill between y-values and 0 >>> [Link](z) Make a violin plot
2D Data or Images Close & Clear
>>> fig, ax = [Link]() >>> [Link]() Clear an axis
>>> axes2[0].pcolor(data2) Pseudocolor plot of 2D array >>> [Link]() Clear the entire figure
>>> im = [Link](img, Colormapped or RGB arrays >>> axes2[0].pcolormesh(data) Pseudocolor plot of 2D array
cmap='gist_earth', >>> [Link]() Close a window
interpolation='nearest', >>> CS = [Link](Y,X,U) Plot contours
vmin=-2, >>> axes2[2].contourf(data1) Plot filled contours
vmax=2) >>> axes2[2]= [Link](CS) Label a contour plot DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet 3 Plotting With Seaborn
Seaborn Axis Grids
Learn Data Science Interactively at [Link] >>> g = [Link](titanic, Subplot grid for plotting conditional >>> h = [Link](iris) Subplot grid for plotting pairwise
col="survived", relationships >>> h = [Link]([Link]) relationships
row="sex") >>> [Link](iris) Plot pairwise bivariate distributions
>>> g = [Link]([Link],"age") >>> i = [Link](x="x", Grid for bivariate plot with marginal
>>> [Link](x="pclass", Draw a categorical plot onto a y="y", univariate plots
y="survived", Facetgrid data=data)
Statistical Data Visualization With Seaborn hue="sex",
data=titanic)
>>> i = [Link]([Link],
[Link])
The Python visualization library Seaborn is based on >>> [Link](x="sepal_width", Plot data and regression model fits >>> [Link]("sepal_length", Plot bivariate distribution
y="sepal_length", across a FacetGrid "sepal_width",
matplotlib and provides a high-level interface for drawing hue="species", data=iris,
attractive statistical graphics. data=iris) kind='kde')

Categorical Plots Regression Plots

Make use of the following aliases to import the libraries: >>> [Link](x="sepal_width", Plot data and a linear regression
Scatterplot
>>> import [Link] as plt y="sepal_length", model fit
>>> [Link](x="species", Scatterplot with one
>>> import seaborn as sns data=iris,
y="petal_length", categorical variable
data=iris) ax=ax)
The basic steps to creating plots with Seaborn are: >>> [Link](x="species", Categorical scatterplot with Distribution Plots
y="petal_length", non-overlapping points
1. Prepare some data data=iris) >>> plot = [Link](data.y, Plot univariate distribution
2. Control figure aesthetics Bar Chart kde=False,
color="b")
3. Plot with Seaborn >>> [Link](x="sex", Show point estimates and
y="survived", confidence intervals with Matrix Plots
4. Further customize your plot hue="class", scatterplot glyphs
>>> [Link](uniform_data,vmin=0,vmax=1) Heatmap
data=titanic)
>>> import [Link] as plt Count Plot
>>>
>>>
>>>
import seaborn as sns
tips = sns.load_dataset("tips")
sns.set_style("whitegrid") Step 2
Step 1
>>> [Link](x="deck",
data=titanic,
Show count of observations
4 Further Customizations Also see Matplotlib
palette="Greens_d")
>>> g = [Link](x="tip", Step 3
Point Plot Axisgrid Objects
y="total_bill",
data=tips, >>> [Link](x="class", Show point estimates and >>> [Link](left=True) Remove left spine
aspect=2) y="survived", confidence intervals as >>> g.set_ylabels("Survived") Set the labels of the y-axis
>>> g = (g.set_axis_labels("Tip","Total bill(USD)"). hue="sex", rectangular bars >>> g.set_xticklabels(rotation=45) Set the tick labels for x
set(xlim=(0,10),ylim=(0,100))) data=titanic, >>> g.set_axis_labels("Survived", Set the axis labels
Step 4 palette={"male":"g", "Sex")
>>> [Link]("title")
>>> [Link](g) Step 5 "female":"m"}, >>> [Link](xlim=(0,5), Set the limit and ticks of the
markers=["^","o"], ylim=(0,5), x-and y-axis
linestyles=["-","--"]) xticks=[0,2.5,5],

1
Boxplot yticks=[0,2.5,5])
Data Also see Lists, NumPy & Pandas >>> [Link](x="alive", Boxplot
Plot
y="age",
>>> import pandas as pd hue="adult_male",
>>> import numpy as np >>> [Link]("A Title") Add plot title
data=titanic)
>>> uniform_data = [Link](10, 12) >>> [Link]("Survived") Adjust the label of the y-axis
>>> [Link](data=iris,orient="h") Boxplot with wide-form data
>>> data = [Link]({'x':[Link](1,101), >>> [Link]("Sex") Adjust the label of the x-axis
'y':[Link](0,4,100)}) Violinplot >>> [Link](0,100) Adjust the limits of the y-axis
>>> [Link](x="age", Violin plot >>> [Link](0,10) Adjust the limits of the x-axis
Seaborn also offers built-in data sets: y="sex", >>> [Link](ax,yticks=[0,5]) Adjust a plot property
>>> titanic = sns.load_dataset("titanic") hue="survived", >>> plt.tight_layout() Adjust subplot params
>>> iris = sns.load_dataset("iris") data=titanic)

2 Figure Aesthetics Also see Matplotlib

5 Show or Save Plot Also see Matplotlib
>>> [Link]() Show the plot
Context Functions >>> [Link]("[Link]") Save the plot as a figure
>>> f, ax = [Link](figsize=(5,6)) Create a figure and one subplot >>> [Link]("[Link]", Save transparent figure
>>> sns.set_context("talk") Set context to "talk" transparent=True)
>>> sns.set_context("notebook", Set context to "notebook",
Seaborn styles font_scale=1.5, Scale font elements and
>>> [Link]() (Re)set the seaborn default
rc={"[Link]":2.5}) override param mapping Close & Clear Also see Matplotlib
>>> sns.set_style("whitegrid") Set the matplotlib parameters Color Palette >>> [Link]() Clear an axis
>>> sns.set_style("ticks", Set the matplotlib parameters >>> [Link]() Clear an entire figure
{"[Link]":8, >>> sns.set_palette("husl",3) Define the color palette >>> [Link]() Close a window
"[Link]":8}) >>> sns.color_palette("husl") Use with with to temporarily set palette
>>> sns.axes_style("whitegrid") Return a dict of params or use with >>> flatui = ["#9b59b6","#3498db","#95a5a6","#e74c3c","#34495e","#2ecc71"]
with to temporarily set the style >>> sns.set_palette(flatui) Set your own color palette DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Retrieving RDD Information Reshaping Data
Basic Information Reducing
PySpark - RDD Basics >>> [Link]() List the number of partitions
>>> [Link](lambda x,y : x+y)
.collect()
Merge the rdd values for
each key
Learn Python for data science Interactively at [Link] >>> [Link]() Count RDD instances [('a',9),('b',2)]
3 >>> [Link](lambda a, b: a + b) Merge the rdd values
>>> [Link]() Count RDD instances by key ('a',7,'a',2,'b',2)
defaultdict(<type 'int'>,{'a':2,'b':1}) Grouping by
>>> [Link]() Count RDD instances by value >>> [Link](lambda x: x % 2) Return RDD of grouped values
Spark defaultdict(<type 'int'>,{('b',2):1,('a',2):1,('a',7):1})
>>> [Link]() Return (key,value) pairs as a
.mapValues(list)
.collect()
PySpark is the Spark Python API that exposes {'a': 2,'b': 2} dictionary >>> [Link]() Group rdd by key
>>> [Link]() Sum of RDD elements .mapValues(list)
the Spark programming model to Python. 4950 .collect()
>>> [Link]([]).isEmpty() Check whether RDD is empty [('a',[7,2]),('b',[2])]
True
Initializing Spark Summary
Aggregating
>>> seqOp = (lambda x,y: (x[0]+y,x[1]+1))
>>> combOp = (lambda x,y:(x[0]+y[0],x[1]+y[1]))
SparkContext >>> [Link]() Maximum value of RDD elements >>> [Link]((0,0),seqOp,combOp) Aggregate RDD elements of each
99 (4950,100) partition and then the results
>>> from pyspark import SparkContext >>> [Link]() Minimum value of RDD elements
>>> sc = SparkContext(master = 'local[2]') >>> [Link]((0,0),seqop,combop) Aggregate values of each RDD key
0
>>> [Link]() Mean value of RDD elements .collect()
Inspect SparkContext 49.5 [('a',(9,2)), ('b',(2,1))]
>>> [Link]() Standard deviation of RDD elements >>> [Link](0,add) Aggregate the elements of each
>>> [Link] Retrieve SparkContext version 28.866070047722118 4950 partition, and then the results
>>> [Link] Retrieve Python version >>> [Link]() Compute variance of RDD elements >>> [Link](0, add) Merge the values for each key
>>> [Link] Master URL to connect to 833.25 .collect()
>>> str([Link]) Path where Spark is installed on worker nodes >>> [Link](3) Compute histogram by bins [('a',9),('b',2)]
>>> str([Link]()) Retrieve name of the Spark User running ([0,33,66,99],[33,33,34])
>>> [Link]() Summary statistics (count, mean, stdev, max & >>> [Link](lambda x: x+x) Create tuples of RDD elements by
SparkContext
>>> [Link] Return application name min) .collect() applying a function
>>> [Link] Retrieve application ID
>>>
>>>
[Link] Return default level of parallelism
[Link] Default minimum number of partitions for Applying Functions Mathematical Operations
RDDs >>> [Link](lambda x: x+(x[1],x[0])) Apply a function to each RDD element >>> [Link](rdd2) Return each rdd value not contained
.collect() .collect() in rdd2
Configuration [('a',7,7,'a'),('a',2,2,'a'),('b',2,2,'b')] [('b',2),('a',7)]
>>> rdd5 = [Link](lambda x: x+(x[1],x[0])) Apply a function to each RDD element >>> [Link](rdd) Return each (key,value) pair of rdd2
>>> from pyspark import SparkConf, SparkContext and flatten the result .collect() with no matching key in rdd
>>> conf = (SparkConf() >>> [Link]() [('d', 1)]
.setMaster("local") ['a',7,7,'a','a',2,2,'a','b',2,2,'b'] >>> [Link](rdd2).collect() Return the Cartesian product of rdd
.setAppName("My app") >>> [Link](lambda x: x) Apply a flatMap function to each (key,value) and rdd2
.set("[Link]", "1g")) .collect() pair of rdd4 without changing the keys
>>> sc = SparkContext(conf = conf) [('a','x'),('a','y'),('a','z'),('b','p'),('b','r')]
Sort
Using The Shell Selecting Data >>> [Link](lambda x: x[1]) Sort RDD by given function
.collect()
In the PySpark shell, a special interpreter-aware SparkContext is already Getting [('d',1),('b',1),('a',2)]
created in the variable called sc. >>> [Link]() Return a list with all RDD elements >>> [Link]() Sort (key, value) RDD by key
[('a', 7), ('a', 2), ('b', 2)] .collect()
$ ./bin/spark-shell --master local[2] >>> [Link](2) Take first 2 RDD elements [('a',2),('b',1),('d',1)]
$ ./bin/pyspark --master local[4] --py-files [Link] [('a', 7), ('a', 2)]
>>> [Link]() Take first RDD element
Set which master the context connects to with the --master argument, and
('a', 7) Repartitioning
add Python .zip, .egg or .py files to the runtime path by passing a >>> [Link](2) Take top 2 RDD elements
[('b', 2), ('a', 7)] >>> [Link](4) New RDD with 4 partitions
comma-separated list to --py-files. >>> [Link](1) Decrease the number of partitions in the RDD to 1
Sampling
Loading Data >>> [Link](False, 0.15, 81).collect() Return sampled subset of rdd3
[3,4,27,31,40,41,42,43,60,76,79,80,86,97] Saving
Parallelized Collections Filtering >>> [Link]("[Link]")
>>> [Link](lambda x: "a" in x) Filter the RDD >>> [Link]("hdfs://namenodehost/parent/child",
>>> rdd = [Link]([('a',7),('a',2),('b',2)]) .collect() '[Link]')
>>> rdd2 = [Link]([('a',2),('d',1),('b',1)]) [('a',7),('a',2)]
>>> rdd3 = [Link](range(100)) >>> [Link]().collect() Return distinct RDD values
>>> rdd4 = [Link]([("a",["x","y","z"]),
("b",["p", "r"])])
['a',2,'b',7]
>>> [Link]().collect() Return (key,value) RDD's keys
Stopping SparkContext
['a', 'a', 'b'] >>> [Link]()
External Data
Read either one text file from HDFS, a local file system or or any Iterating Execution
Hadoop-supported file system URI with textFile(), or read in a directory >>> def g(x): print(x)
>>> [Link](g) Apply a function to all RDD elements $ ./bin/spark-submit examples/src/main/python/[Link]
of text files with wholeTextFiles().
('a', 7)
>>> textFile = [Link]("/my/directory/*.txt") ('b', 2) DataCamp
>>> textFile2 = [Link]("/my/directory/") ('a', 2) Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Duplicate Values GroupBy
>>> df = [Link]() >>> [Link]("age")\ Group by age, count the members
PySpark - SQL Basics .count() \
.show()
in the groups
Learn Python for data science Interactively at [Link] Queries
>>> from [Link] import functions as F
Select Filter
>>> [Link]("firstName").show() Show all entries in firstName column >>> [Link](df["age"]>24).show() Filter entries of age, only keep those
>>> [Link]("firstName","lastName") \ records of which the values are >24
PySpark & Spark SQL .show()
>>> [Link]("firstName", Show all entries in firstName, age

Spark SQL is Apache Spark's module for "age", and type

Sort
explode("phoneNumber") \
working with structured data. .alias("contactInfo")) \
.select("[Link]", >>> [Link]([Link]()).collect()
>>> [Link]("age", ascending=False).collect()
Initializing SparkSession "firstName",
"age") \ >>> [Link](["age","city"],ascending=[0,1])\
A SparkSession can be used create DataFrame, register DataFrame as tables, .show() .collect()
execute SQL over tables, cache tables, and read parquet files. >>> [Link](df["firstName"],df["age"]+ 1) Show all entries in firstName and age,
.show() add 1 to the entries of age
>>> from [Link] import SparkSession
>>> spark = SparkSession \
>>> [Link](df['age'] > 24).show()
When
Show all entries where age >24 Missing & Replacing Values
.builder \ >>> [Link]("firstName", Show firstName and 0 or 1 depending
.appName("Python Spark SQL basic example") \ >>> [Link](50).show() Replace null values
[Link]([Link] > 30, 1) \ on age >30 >>> [Link]().show() Return new df omitting rows with null values
.config("[Link]", "some-value") \ .otherwise(0)) \
.getOrCreate() >>> [Link] \ Return new df replacing one value with
.show() .replace(10, 20) \ another
>>> df[[Link]("Jane","Boris")] Show firstName if in the given options .show()
Creating DataFrames Like
.collect()

From RDDs
>>> [Link]("firstName", Show firstName, and lastName is
[Link]("Smith")) \ TRUE if lastName is like Smith
Repartitioning
.show()
>>> from [Link] import * Startswith - Endswith >>> [Link](10)\ df with 10 partitions
>>> [Link]("firstName", Show firstName, and TRUE if .rdd \
Infer Schema .getNumPartitions()
>>> sc = [Link] [Link] \ lastName starts with Sm
.startswith("Sm")) \ >>> [Link](1).[Link]() df with 1 partition
>>> lines = [Link]("[Link]")
.show()
>>> parts = [Link](lambda l: [Link](",")) >>> [Link]([Link]("th")) \ Show last names ending in th
>>>
>>>
people = [Link](lambda p: Row(name=p[0],age=int(p[1])))
peopledf = [Link](people)
.show() Running SQL Queries Programmatically
Substring
Specify Schema >>> [Link]([Link](1, 3) \ Return substrings of firstName Registering DataFrames as Views
>>> people = [Link](lambda p: Row(name=p[0], .alias("name")) \
age=int(p[1].strip()))) .collect() >>> [Link]("people")
>>> schemaString = "name age" Between >>> [Link]("customer")
>>> fields = [StructField(field_name, StringType(), True) for >>> [Link]([Link](22, 24)) \ Show age: values are TRUE if between >>> [Link]("customer")
field_name in [Link]()] .show() 22 and 24
>>> schema = StructType(fields) Query Views
>>> [Link](people, schema).show()
+--------+---+
| name|age|
Add, Update & Remove Columns >>> df5 = [Link]("SELECT * FROM customer").show()
+--------+---+ >>> peopledf2 = [Link]("SELECT * FROM global_temp.people")\
|
|
Mine| 28|
Filip| 29|
Adding Columns .show()
|Jonathan| 30|
+--------+---+ >>> df = [Link]('city',[Link]) \
.withColumn('postalCode',[Link]) \
From Spark Data Sources .withColumn('state',[Link]) \
.withColumn('streetAddress',[Link]) \
Output
.withColumn('telePhoneNumber', Data Structures
JSON explode([Link])) \
>>> df = [Link]("[Link]") .withColumn('telePhoneType',
>>> [Link]() >>> rdd1 = [Link] Convert df into an RDD
+--------------------+---+---------+--------+--------------------+ explode([Link])) >>> [Link]().first() Convert df into a RDD of string
| address|age|firstName |lastName| phoneNumber|
+--------------------+---+---------+--------+--------------------+ >>> [Link]() Return the contents of df as Pandas
|[New York,10021,N...| 25|
|[New York,10021,N...| 21|
John|
Jane|
Smith|[[212 555-1234,ho...|
Doe|[[322 888-1234,ho...|
Updating Columns DataFrame
+--------------------+---+---------+--------+--------------------+
>>> df2 = [Link]("[Link]", format="json")
>>> df = [Link]('telePhoneNumber', 'phoneNumber') Write & Save to Files
Parquet files Removing Columns >>> [Link]("firstName", "city")\
>>> df3 = [Link]("[Link]") .write \
TXT files >>> df = [Link]("address", "phoneNumber") .save("[Link]")
>>> df4 = [Link]("[Link]") >>> df = [Link]([Link]).drop([Link]) >>> [Link]("firstName", "age") \
.write \
.save("[Link]",format="json")
Inspect Data
>>> [Link] Return df column names and data types >>> [Link]().show() Compute summary statistics Stopping SparkSession
>>> [Link]() Display the content of df >>> [Link] Return the columns of df
>>> [Link]() >>> [Link]()
>>> [Link]() Return first n rows Count the number of rows in df
>>> [Link]() Return first row >>> [Link]().count() Count the number of distinct rows in df
>>> [Link](2) Return the first n rows >>> [Link]() Print the schema of df DataCamp
>>> [Link] Return the schema of df >>> [Link]() Print the (logical and physical) plans
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Model Architecture Inspect Model
>>> model.output_shape
Sequential Model Model output shape
Keras >>> from [Link] import Sequential
>>>
>>>
[Link]()
model.get_config()
Model summary representation
Model configuration
Learn Python for data science Interactively at [Link] >>> model = Sequential() >>> model.get_weights() List all weight tensors in the model
>>> model2 = Sequential()
>>> model3 = Sequential() Compile Model
Multilayer Perceptron (MLP) MLP: Binary Classification
Keras Binary Classification >>> [Link](optimizer='adam',
loss='binary_crossentropy',
Keras is a powerful and easy-to-use deep learning library for >>> from [Link] import Dense metrics=['accuracy'])
Theano and TensorFlow that provides a high-level neural >>> [Link](Dense(12, MLP: Multi-Class Classification
input_dim=8, >>> [Link](optimizer='rmsprop',
networks API to develop and evaluate deep learning models. kernel_initializer='uniform', loss='categorical_crossentropy',
activation='relu')) metrics=['accuracy'])
A Basic Example >>> [Link](Dense(8,kernel_initializer='uniform',activation='relu'))
MLP: Regression
>>> [Link](Dense(1,kernel_initializer='uniform',activation='sigmoid')) >>> [Link](optimizer='rmsprop',
>>> import numpy as np loss='mse',
>>> from [Link] import Sequential Multi-Class Classification metrics=['mae'])
>>> from [Link] import Dense >>> from [Link] import Dropout
>>> data = [Link]((1000,100)) >>> [Link](Dense(512,activation='relu',input_shape=(784,))) Recurrent Neural Network
>>> labels = [Link](2,size=(1000,1)) >>> [Link](Dropout(0.2)) >>> [Link](loss='binary_crossentropy',
>>> model = Sequential() optimizer='adam',
>>> [Link](Dense(512,activation='relu')) metrics=['accuracy'])
>>> [Link](Dense(32, >>> [Link](Dropout(0.2))
activation='relu', >>> [Link](Dense(10,activation='softmax'))

>>>
input_dim=100))
[Link](Dense(1, activation='sigmoid'))
Regression Model Training
>>> [Link](optimizer='rmsprop', >>> [Link](Dense(64,activation='relu',input_dim=train_data.shape[1])) >>> [Link](x_train4,
loss='binary_crossentropy', >>> [Link](Dense(1)) y_train4,
metrics=['accuracy']) batch_size=32,
>>> [Link](data,labels,epochs=10,batch_size=32) Convolutional Neural Network (CNN) epochs=15,
verbose=1,
>>> predictions = [Link](data) >>> from [Link] import Activation,Conv2D,MaxPooling2D,Flatten validation_data=(x_test4,y_test4))
>>> [Link](Conv2D(32,(3,3),padding='same',input_shape=x_train.shape[1:]))
Data Also see NumPy, Pandas & Scikit-Learn >>>
>>>
[Link](Activation('relu'))
[Link](Conv2D(32,(3,3))) Evaluate Your Model's Performance
Your data needs to be stored as NumPy arrays or as a list of NumPy arrays. Ide- >>> [Link](Activation('relu')) >>> score = [Link](x_test,
>>> [Link](MaxPooling2D(pool_size=(2,2))) y_test,
ally, you split the data in training and test sets, for which you can also resort batch_size=32)
>>> [Link](Dropout(0.25))
to the train_test_split module of sklearn.cross_validation.
>>> [Link](Conv2D(64,(3,3), padding='same'))
Keras Data Sets >>>
>>>
[Link](Activation('relu'))
[Link](Conv2D(64,(3, 3)))
Prediction
>>> from [Link] import boston_housing, >>> [Link](Activation('relu')) >>> [Link](x_test4, batch_size=32)
mnist, >>> [Link](MaxPooling2D(pool_size=(2,2))) >>> model3.predict_classes(x_test4,batch_size=32)
cifar10, >>> [Link](Dropout(0.25))
imdb
>>> (x_train,y_train),(x_test,y_test) = mnist.load_data()
>>> (x_train2,y_train2),(x_test2,y_test2) = boston_housing.load_data()
>>>
>>>
[Link](Flatten())
[Link](Dense(512))
Save/ Reload Models
>>> (x_train3,y_train3),(x_test3,y_test3) = cifar10.load_data() >>> [Link](Activation('relu')) >>> from [Link] import load_model
>>> (x_train4,y_train4),(x_test4,y_test4) = imdb.load_data(num_words=20000) >>> [Link](Dropout(0.5)) >>> [Link]('model_file.h5')
>>> num_classes = 10 >>> my_model = load_model('my_model.h5')
>>> [Link](Dense(num_classes))
>>> [Link](Activation('softmax'))
Other
Recurrent Neural Network (RNN) Model Fine-tuning
>>> from [Link] import urlopen
>>> data = [Link](urlopen("[Link]
ml/machine-learning-databases/pima-indians-diabetes/
>>> from [Link] import Embedding,LSTM Optimization Parameters
[Link]"),delimiter=",") >>> [Link](Embedding(20000,128)) >>> from [Link] import RMSprop
>>> X = data[:,0:8] >>> [Link](LSTM(128,dropout=0.2,recurrent_dropout=0.2)) >>> opt = RMSprop(lr=0.0001, decay=1e-6)
>>> y = data [:,8] >>> [Link](Dense(1,activation='sigmoid')) >>> [Link](loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
Preprocessing Also see NumPy & Scikit-Learn
Early Stopping
Sequence Padding Train and Test Sets >>> from [Link] import EarlyStopping
>>> from [Link] import sequence >>> from sklearn.model_selection import train_test_split >>> early_stopping_monitor = EarlyStopping(patience=2)
>>> x_train4 = sequence.pad_sequences(x_train4,maxlen=80) >>> X_train5,X_test5,y_train5,y_test5 = train_test_split(X, >>> [Link](x_train4,
>>> x_test4 = sequence.pad_sequences(x_test4,maxlen=80) y,
test_size=0.33, y_train4,
random_state=42) batch_size=32,
One-Hot Encoding epochs=15,
>>> from [Link] import to_categorical Standardization/Normalization validation_data=(x_test4,y_test4),
>>> Y_train = to_categorical(y_train, num_classes) >>> from [Link] import StandardScaler callbacks=[early_stopping_monitor])
>>> Y_test = to_categorical(y_test, num_classes) >>> scaler = StandardScaler().fit(x_train2)
>>> Y_train3 = to_categorical(y_train3, num_classes) >>> standardized_X = [Link](x_train2) DataCamp
>>> Y_test3 = to_categorical(y_test3, num_classes) >>> standardized_X_test = [Link](x_test2) Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet 3 Renderers & Visual Customizations
Glyphs Customized Glyphs Also see Data
Bokeh Scatter Markers Selection and Non-Selection Glyphs
Learn Bokeh Interactively at [Link], >>> [Link]([Link]([1,2,3]), [Link]([3,2,1]), >>> p = figure(tools='box_select')
taught by Bryan Van de Ven, core contributor fill_color='white') >>> [Link]('mpg', 'cyl', source=cds_df,
>>> [Link]([Link]([1.5,3.5,5.5]), [1,4,3], selection_color='red',
color='blue', size=1) nonselection_alpha=0.1)
Line Glyphs
Plotting With Bokeh >>> [Link]([1,2,3,4], [3,4,5,6], line_width=2) Hover Glyphs
>>> p2.multi_line([Link]([[1,2,3],[5,6,7]]), >>> hover = HoverTool(tooltips=None, mode='vline')
The Python interactive visualization library Bokeh [Link]([[3,4,5],[3,2,1]]), >>> p3.add_tools(hover)
enables high-performance visual presentation of color="blue")
Colormapping
large datasets in modern web browsers. Rows & Columns Layout US
Asia >>> color_mapper = CategoricalColorMapper(
factors=['US', 'Asia', 'Europe'],
Europe

Rows Columns palette=['blue', 'red', 'green'])

Bokeh’s mid-level general purpose [Link] >>> from [Link] import row >>> from [Link] import columns >>> [Link]('mpg', 'cyl', source=cds_df,
>>> layout = row(p1,p2,p3) >>> layout = column(p1,p2,p3)
interface is centered around two main components: data color=dict(field='origin',
Nesting Rows & Columns transform=color_mapper),
and glyphs. >>>layout = row(column(p1,p2), p3) legend='Origin'))

Grid Layout Linked Plots Also see Data

+ = >>> from [Link] import gridplot
>>> row1 = [p1,p2]
Linked Axes
data glyphs plot >>> p2.x_range = p1.x_range
>>> row2 = [p3]
>>> p2.y_range = p1.y_range
>>> layout = gridplot([[p1,p2],[p3]])
The basic steps to creating plots with the [Link] Linked Brushing
interface are: Tabbed Layout >>> p4 = figure(plot_width = 100, tools='box_select,lasso_select')
>>> from [Link] import Panel, Tabs >>> [Link]('mpg', 'cyl', source=cds_df)
1. Prepare some data: >>> tab1 = Panel(child=p1, title="tab1") >>> p5 = figure(plot_width = 200, tools='box_select,lasso_select')
Python lists, NumPy arrays, Pandas DataFrames and other sequences of values >>> [Link]('mpg', 'hp', source=cds_df)
>>> tab2 = Panel(child=p2, title="tab2")
2. Create a new plot >>> layout = Tabs(tabs=[tab1, tab2]) >>> layout = row(p4,p5)
3. Add renderers for your data, with visual customizations Legends
4. Specify where to generate the output Legend Location Legend Orientation
5. Show or save the results Inside Plot Area >>> [Link] = "horizontal"
>>> from [Link] import figure >>> [Link] = 'bottom_left' >>> [Link] = "vertical"
>>> from [Link] import output_file, show Outside Plot Area
>>> x = [1, 2, 3, 4, 5] >>> r1 = [Link]([Link]([1,2,3]), [Link]([3,2,1]) Legend Background & Border
Step 1 >>> r2 = [Link]([1,2,3,4], [3,4,5,6])
>>> y = [6, 7, 2, 4, 5] >>> [Link].border_line_color = "navy"
>>> legend = Legend(items=[("One" , [p1, r1]),("Two" , [r2])], location=(0, -30))
>>> p = figure(title="simple line example", Step 2 >>> p.add_layout(legend, 'right') >>> [Link].background_fill_color = "white"
x_axis_label='x',
y_axis_label='y')
>>> [Link](x, y, legend="Temp.", line_width=2) Step 3 4 Output Statistical Charts With Bokeh
Bokeh’s high-level [Link] interface is ideal for quickly
Also see Data
>>> output_file("[Link]") Step 4 Output to HTML File
>>> show(p) Step 5 creating statistical charts
>>> from [Link] import output_file, show

1 Data Also see Lists, NumPy & Pandas

>>> output_file('my_bar_chart.html', mode='cdn')
Notebook Output
Bar Chart
>>> from [Link] import Bar
Under the hood, your data is converted to Column Data >>> p = Bar(df, stacked=True, palette=['red','blue'])
>>> from [Link] import output_notebook, show
Sources. You can also do this manually: Box Plot
>>> import numpy as np >>> output_notebook()
>>> import pandas as pd >>> from [Link] import BoxPlot
>>> df = [Link]([Link]([[33.9,4,65, 'US'], Embedding >>> p = BoxPlot(df, values='vals', label='cyl',
[32.4,4,66, 'Asia'], legend='bottom_right')
Standalone HTML
Label 1

[21.4,4,109, 'Europe']]),
Label 2
Label 3

columns=['mpg','cyl', 'hp', 'origin'],

index=['Toyota', 'Fiat', 'Volvo'])
>>> from [Link] import file_html Histogram
>>> html = file_html(p, CDN, "my_plot") Histogram

>>> from [Link] import Histogram

>>> from [Link] import ColumnDataSource Components >>> p = Histogram(df, title='Histogram')
>>> cds_df = ColumnDataSource(df)
>>> from [Link] import components Scatter Plot
2 Plotting
>>> script, div = components(p) >>> from [Link] import Scatter

5
>>> p = Scatter(df, x='mpg', y ='hp', marker='square',
Show or Save Your Plots
y-axis
>>> from [Link] import figure xlabel='Miles Per Gallon',
>>> p1 = figure(plot_width=300, tools='pan,box_zoom') ylabel='Horsepower')
x-axis

>>> p2 = figure(plot_width=300, plot_height=300, >>> show(p1) >>> save(p1)

x_range=(0, 8), y_range=(0, 8)) >>> show(layout) >>> save(layout) DataCamp
>>> p3 = figure() Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Linear Algebra Also see NumPy
You’ll use the linalg and sparse modules. Note that [Link] contains and expands on [Link].
SciPy - Linear Algebra >>> from scipy import linalg, sparse Matrix Functions
Learn More Python for Data Science Interactively at [Link]
Creating Matrices Addition
>>> [Link](A,D) Addition
>>> A = [Link]([Link]((2,2)))
>>> B = [Link](b) Subtraction
SciPy >>> C = [Link]([Link]((10,5))) >>> [Link](A,D) Subtraction
The SciPy library is one of the core packages for >>> D = [Link]([[3,4], [5,6]]) Division
>>> [Link](A,D) Division
scientific computing that provides mathematical Basic Matrix Routines Multiplication
algorithms and convenience functions built on the >>> A @ D Multiplication operator
Inverse
NumPy extension of Python. >>> A.I Inverse (Python 3)
>>> [Link](D,A) Multiplication
>>> [Link](A) Inverse
>>> [Link](A,D) Dot product
Interacting With NumPy Also see NumPy Transposition >>> [Link](A,D) Vector dot product
>>> import numpy as np >>> A.T Tranpose matrix >>> [Link](A,D) Inner product
>>> A.H Conjugate transposition >>> [Link](A,D) Outer product
>>> a = [Link]([1,2,3])
>>> b = [Link]([(1+5j,2j,3j), (4j,5j,6j)]) Trace >>> [Link](A,D) Tensor dot product
>>> c = [Link]([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]]) >>> [Link](A) Trace >>> [Link](A,D) Kronecker product
Norm Exponential Functions
Index Tricks >>> [Link](A) Matrix exponential
>>> [Link](A) Frobenius norm
>>> [Link][0:5,0:5] Create a dense meshgrid >>> linalg.expm2(A) Matrix exponential (Taylor Series)
>>> [Link](A,1) L1 norm (max column sum)
>>> [Link][0:2,0:2] Create an open meshgrid >>> linalg.expm3(D) Matrix exponential (eigenvalue
>>> [Link](A,[Link]) L inf norm (max row sum) decomposition)
>>> np.r_[3,[0]*5,-[Link]j] Stack arrays vertically (row-wise)
>>> np.c_[b,c] Create stacked column-wise arrays Rank Logarithm Function
>>> [Link].matrix_rank(C) Matrix rank >>> [Link](A) Matrix logarithm
Shape Manipulation Determinant Trigonometric Functions
>>> [Link](A) Determinant >>> [Link](D) Matrix sine
>>> [Link](b) Permute array dimensions
Solving linear problems >>> [Link](D) Matrix cosine
>>> [Link]() Flatten the array >>> [Link](A) Matrix tangent
>>> [Link]((b,c)) Stack arrays horizontally (column-wise) >>> [Link](A,b) Solver for dense matrices
>>> [Link]((a,b)) Stack arrays vertically (row-wise) >>> E = [Link](a).T Solver for dense matrices Hyperbolic Trigonometric Functions
>>> [Link](c,2) Split the array horizontally at the 2nd index >>> [Link](F,E) Least-squares solution to linear matrix >>> [Link](D) Hypberbolic matrix sine
>>> [Link](d,2) Split the array vertically at the 2nd index equation >>> [Link](D) Hyperbolic matrix cosine
Generalized inverse >>> [Link](A) Hyperbolic matrix tangent
Polynomials >>> [Link](C) Compute the pseudo-inverse of a matrix Matrix Sign Function
(least-squares solver) >>> [Link](A) Matrix sign function
>>> from numpy import poly1d
>>> p = poly1d([3,4,5]) Create a polynomial object >>> linalg.pinv2(C) Compute the pseudo-inverse of a matrix Matrix Square Root
(SVD) >>> [Link](A) Matrix square root
Vectorizing Functions Creating Sparse Matrices Arbitrary Functions
>>> def myfunc(a): >>> [Link](A, lambda x: x*x) Evaluate matrix function
if a < 0: >>> F = [Link](3, k=1) Create a 2X2 identity matrix
return a*2
else:
>>> G = [Link]([Link](2)) Create a 2x2 identity matrix Decompositions
return a/2 >>> C[C > 0.5] = 0
>>> [Link](myfunc) Vectorize functions
>>> H = sparse.csr_matrix(C) Compressed Sparse Row matrix Eigenvalues and Eigenvectors
>>> I = sparse.csc_matrix(D) Compressed Sparse Column matrix >>> la, v = [Link](A) Solve ordinary or generalized
>>> J = sparse.dok_matrix(A) Dictionary Of Keys matrix eigenvalue problem for square matrix
Type Handling >>> [Link]() Sparse matrix to full matrix >>> l1, l2 = la Unpack eigenvalues
>>> sparse.isspmatrix_csc(A) Identify sparse matrix >>> v[:,0] First eigenvector
>>> [Link](b) Return the real part of the array elements >>> v[:,1] Second eigenvector
>>> [Link](b) Return the imaginary part of the array elements
>>> np.real_if_close(c,tol=1000) Return a real array if complex parts close to 0 Sparse Matrix Routines >>> [Link](A) Unpack eigenvalues
>>> [Link]['f']([Link]) Cast object to a data type Singular Value Decomposition
Inverse >>> U,s,Vh = [Link](B) Singular Value Decomposition (SVD)
>>> [Link](I) Inverse >>> M,N = [Link]
Other Useful Functions
Norm >>> Sig = [Link](s,M,N) Construct sigma matrix in SVD
>>> [Link](b,deg=True) Return the angle of the complex argument >>> [Link](I) Norm LU Decomposition
>>> g = [Link](0,[Link],num=5) Create an array of evenly spaced values Solving linear problems >>> P,L,U = [Link](C) LU Decomposition
(number of samples)
>>> g [3:] += [Link] >>> [Link](H,I) Solver for sparse matrices
>>> [Link](g) Unwrap Sparse Matrix Decompositions
>>>
>>>
[Link](0,10,3) Create an array of evenly spaced values (log scale)
[Link]([c<4],[c*2]) Return values from a list of arrays depending on
Sparse Matrix Functions >>> la, v = [Link](F,1) Eigenvalues and eigenvectors
conditions >>> [Link](I) Sparse matrix exponential >>> [Link](H, 2) SVD
>>> [Link](a) Factorial
>>> Combine N things taken at k time
>>>
[Link](10,3,exact=True)
misc.central_diff_weights(3) Weights for Np-point central derivative Asking For Help DataCamp
>>> [Link](myfunc,1.0) Find the n-th derivative of a function at a point >>> help([Link])
>>> [Link]([Link]) Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Create Your Model Evaluate Your Model’s Performance
Supervised Learning Estimators Classification Metrics
Scikit-Learn
Learn Python for data science Interactively at [Link] Linear Regression Accuracy Score
>>> from sklearn.linear_model import LinearRegression >>> [Link](X_test, y_test) Estimator score method
>>> lr = LinearRegression(normalize=True) >>> from [Link] import accuracy_score Metric scoring functions
>>> accuracy_score(y_test, y_pred)
Support Vector Machines (SVM)
Scikit-learn >>> from [Link] import SVC Classification Report
>>> svc = SVC(kernel='linear') >>> from [Link] import classification_report Precision, recall, f1-score
Scikit-learn is an open source Python library that Naive Bayes >>> print(classification_report(y_test, y_pred)) and support
implements a range of machine learning, >>> from sklearn.naive_bayes import GaussianNB Confusion Matrix
>>> gnb = GaussianNB() >>> from [Link] import confusion_matrix
preprocessing, cross-validation and visualization >>> print(confusion_matrix(y_test, y_pred))
algorithms using a unified interface. KNN
>>> from sklearn import neighbors Regression Metrics
A Basic Example >>> knn = [Link](n_neighbors=5)
>>> from sklearn import neighbors, datasets, preprocessing
Mean Absolute Error
>>> from sklearn.model_selection import train_test_split Unsupervised Learning Estimators >>> from [Link] import mean_absolute_error
>>> from [Link] import accuracy_score >>> y_true = [3, -0.5, 2]
>>> iris = datasets.load_iris() Principal Component Analysis (PCA) >>> mean_absolute_error(y_true, y_pred)
>>> X, y = [Link][:, :2], [Link] >>> from [Link] import PCA Mean Squared Error
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) >>> pca = PCA(n_components=0.95) >>> from [Link] import mean_squared_error
>>> scaler = [Link]().fit(X_train) >>> mean_squared_error(y_test, y_pred)
>>> X_train = [Link](X_train)
K Means
>>> X_test = [Link](X_test) >>> from [Link] import KMeans R² Score
>>> knn = [Link](n_neighbors=5) >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from [Link] import r2_score
>>> r2_score(y_true, y_pred)
>>> [Link](X_train, y_train)
>>> y_pred = [Link](X_test)
>>> accuracy_score(y_test, y_pred) Model Fitting Clustering Metrics
Adjusted Rand Index
Supervised learning >>> from [Link] import adjusted_rand_score
Loading The Data Also see NumPy & Pandas >>> [Link](X, y) Fit the model to the data
>>> adjusted_rand_score(y_true, y_pred)
>>> [Link](X_train, y_train)
Your data needs to be numeric and stored as NumPy arrays or SciPy sparse >>> [Link](X_train, y_train) Homogeneity
>>> from [Link] import homogeneity_score
matrices. Other types that are convertible to numeric arrays, such as Pandas Unsupervised Learning >>> homogeneity_score(y_true, y_pred)
DataFrame, are also acceptable. >>> k_means.fit(X_train) Fit the model to the data
>>> pca_model = pca.fit_transform(X_train) Fit to data, then transform it V-measure
>>> import numpy as np >>> from [Link] import v_measure_score
>>> X = [Link]((10,5)) >>> metrics.v_measure_score(y_true, y_pred)
>>> y = [Link](['M','M','F','F','M','F','M','M','F','F','F'])
>>> X[X < 0.7] = 0 Prediction Cross-Validation
>>> from sklearn.cross_validation import cross_val_score
Supervised Estimators >>> print(cross_val_score(knn, X_train, y_train, cv=4))
Training And Test Data >>> y_pred = [Link]([Link]((2,5))) Predict labels
>>> y_pred = [Link](X_test)
>>> print(cross_val_score(lr, X, y, cv=2))
Predict labels
>>> from sklearn.model_selection import train_test_split >>> y_pred = knn.predict_proba(X_test) Estimate probability of a label
>>> X_train, X_test, y_train, y_test = train_test_split(X,
y, Unsupervised Estimators Tune Your Model
random_state=0) >>> y_pred = k_means.predict(X_test) Predict labels in clustering algos Grid Search
>>> from sklearn.grid_search import GridSearchCV
>>> params = {"n_neighbors": [Link](1,3),
Preprocessing The Data "metric": ["euclidean", "cityblock"]}
>>> grid = GridSearchCV(estimator=knn,
Standardization Encoding Categorical Features param_grid=params)
>>> [Link](X_train, y_train)
>>> from [Link] import StandardScaler >>> from [Link] import LabelEncoder >>> print(grid.best_score_)
>>> scaler = StandardScaler().fit(X_train) >>> print(grid.best_estimator_.n_neighbors)
>>> enc = LabelEncoder()
>>> standardized_X = [Link](X_train) >>> y = enc.fit_transform(y)
>>> standardized_X_test = [Link](X_test) Randomized Parameter Optimization
Normalization Imputing Missing Values >>> from sklearn.grid_search import RandomizedSearchCV
>>> params = {"n_neighbors": range(1,5),
>>> from [Link] import Normalizer "weights": ["uniform", "distance"]}
>>> from [Link] import Imputer >>> rsearch = RandomizedSearchCV(estimator=knn,
>>> scaler = Normalizer().fit(X_train) >>> imp = Imputer(missing_values=0, strategy='mean', axis=0) param_distributions=params,
>>> normalized_X = [Link](X_train) >>> imp.fit_transform(X_train) cv=4,
>>> normalized_X_test = [Link](X_test) n_iter=8,
random_state=5)
Binarization Generating Polynomial Features >>> [Link](X_train, y_train)
>>> print(rsearch.best_score_)
>>> from [Link] import Binarizer >>> from [Link] import PolynomialFeatures
>>> binarizer = Binarizer(threshold=0.0).fit(X) >>> poly = PolynomialFeatures(5)
>>> binary_X = [Link](X) >>> poly.fit_transform(X) DataCamp
Learn Python for Data Science Interactively

Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
Beginners Python Cheat Sheet PCC All
96% (28)
Beginners Python Cheat Sheet PCC All
26 pages
NumPy Basics for Data Science
100% (17)
NumPy Basics for Data Science
7 pages
Python Cheat Sheet: Ata Tructures
100% (12)
Python Cheat Sheet: Ata Tructures
2 pages
Python 3.2 Reference Card
100% (12)
Python 3.2 Reference Card
2 pages
Beginners Python Cheat Sheet
89% (9)
Beginners Python Cheat Sheet
28 pages
NumPy Basics Cheat Sheet for Python
100% (5)
NumPy Basics Cheat Sheet for Python
14 pages
Python For Data Science - Cheat Sheets
100% (4)
Python For Data Science - Cheat Sheets
10 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
Practical Projects
100% (32)
Practical Projects
478 pages
Python 3 Cheat Sheet v3
100% (5)
Python 3 Cheat Sheet v3
13 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
No ratings yet
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
14 pages
Data Wrangling with Pandas Guide
No ratings yet
Data Wrangling with Pandas Guide
1 page
Data Wrangling Cheat Sheet
No ratings yet
Data Wrangling Cheat Sheet
1 page
Pandas Data Wrangling Cheat Sheet
No ratings yet
Pandas Data Wrangling Cheat Sheet
2 pages
Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data
No ratings yet
Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data
1 page
Pandas
No ratings yet
Pandas
44 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Python Data Structures and Libraries Guide
No ratings yet
Python Data Structures and Libraries Guide
7 pages
Pandas Data Reshaping Cheat Sheet
No ratings yet
Pandas Data Reshaping Cheat Sheet
1 page
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Python Basics Cheat Sheet
No ratings yet
Python Basics Cheat Sheet
3 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Pandas Library
No ratings yet
Pandas Library
6 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
No ratings yet
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
26 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas
No ratings yet
Pandas
5 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Pandas
No ratings yet
Pandas
26 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Python For Data Science
No ratings yet
Python For Data Science
4 pages
Python Programming Pandas Across Examples
100% (1)
Python Programming Pandas Across Examples
350 pages
01-Numpy & Pandas
No ratings yet
01-Numpy & Pandas
69 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas Operations Guide
No ratings yet
Pandas Operations Guide
6 pages
Mastering Pandas: DataFrame Operations
100% (2)
Mastering Pandas: DataFrame Operations
33 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas Dataframe Cheat Sheet
No ratings yet
Pandas Dataframe Cheat Sheet
3 pages
Unit 2
No ratings yet
Unit 2
81 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Pandas Cheat Sheet for Data Manipulation
No ratings yet
Pandas Cheat Sheet for Data Manipulation
1 page
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
100% (15)
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
244 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (46)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
The Python Manual
94% (33)
The Python Manual
196 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Data Structure and Algorithms With Python
100% (16)
Data Structure and Algorithms With Python
369 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Full Course of Machine Learning
100% (17)
Full Course of Machine Learning
660 pages
Python in Excel (2024)
100% (14)
Python in Excel (2024)
607 pages
Machine Learning Projects in Python
100% (17)
Machine Learning Projects in Python
135 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
PYTHON Learn Python Programming in 90 Minutes or Less Python Learning Python Python Programming Python Tutorial Python Programming For Beginners Python For Dummies Book 1 PDF
92% (13)
PYTHON Learn Python Programming in 90 Minutes or Less Python Learning Python Python Programming Python Tutorial Python Programming For Beginners Python For Dummies Book 1 PDF
161 pages
Learn Python in A Day
93% (15)
Learn Python in A Day
141 pages
Understanding Machine Learning
100% (73)
Understanding Machine Learning
416 pages
(Hunt, J.) A Beginners Guide To Python 3 Programming
96% (47)
(Hunt, J.) A Beginners Guide To Python 3 Programming
440 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
Coffee Break NumPy PDF
100% (8)
Coffee Break NumPy PDF
211 pages
Python For Science and Engineering
100% (15)
Python For Science and Engineering
304 pages
EBOOK - Python Crash Course For Data Analysis
100% (12)
EBOOK - Python Crash Course For Data Analysis
168 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (18)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Object Oriented Python Tutorial
100% (21)
Object Oriented Python Tutorial
111 pages
Python Programming Guide Book
100% (20)
Python Programming Guide Book
323 pages
100 Skills To Better Python
100% (10)
100 Skills To Better Python
80 pages
3rd Edition YOUR Mental Toughness Playbook PDF
100% (2)
3rd Edition YOUR Mental Toughness Playbook PDF
40 pages
Python List and Numpy Array Basics
No ratings yet
Python List and Numpy Array Basics
1 page
Scikit-Learn Classification Cheat Sheet
No ratings yet
Scikit-Learn Classification Cheat Sheet
1 page
Python SciPy Cheat Sheet Linear Algebra PDF
No ratings yet
Python SciPy Cheat Sheet Linear Algebra PDF
1 page
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
NumPy Basics for Data Science
No ratings yet
NumPy Basics for Data Science
1 page
Matplotlib Python Plotting Guide
No ratings yet
Matplotlib Python Plotting Guide
1 page
Keras Deep Learning Cheat Sheet
No ratings yet
Keras Deep Learning Cheat Sheet
1 page
Python Bokeh Cheat Sheet PDF
No ratings yet
Python Bokeh Cheat Sheet PDF
1 page
PySpark Cheat Sheet Python
No ratings yet
PySpark Cheat Sheet Python
1 page
NumPy Boolean Masks and Logic
No ratings yet
NumPy Boolean Masks and Logic
5 pages
Data Analytics Chennai
No ratings yet
Data Analytics Chennai
20 pages
PWP All Model by Campusify
No ratings yet
PWP All Model by Campusify
135 pages
All Python CS
100% (2)
All Python CS
10 pages
COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
Implementation of CALFEM For Python
No ratings yet
Implementation of CALFEM For Python
63 pages
3.2 Numpy Array Sample Programs
No ratings yet
3.2 Numpy Array Sample Programs
54 pages
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
No ratings yet
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
1 page
LSB Image Steganography in Python
No ratings yet
LSB Image Steganography in Python
10 pages
Python PBL
No ratings yet
Python PBL
9 pages
Class XII Holiday Homework 2024-25
No ratings yet
Class XII Holiday Homework 2024-25
34 pages
Mini Project
100% (1)
Mini Project
27 pages
Essential Guide To Data Science For Petroleum Engineers
No ratings yet
Essential Guide To Data Science For Petroleum Engineers
150 pages
NIELIT DS Internship Report
No ratings yet
NIELIT DS Internship Report
23 pages
Artificial Intelligence Lab 3
No ratings yet
Artificial Intelligence Lab 3
6 pages
Python Advanced and Statistics
No ratings yet
Python Advanced and Statistics
24 pages
OceanofPDF - Com Python Experiments in Physics and Astronomy - Padraig Houlahan
No ratings yet
OceanofPDF - Com Python Experiments in Physics and Astronomy - Padraig Houlahan
280 pages
Informatics Practices Record
No ratings yet
Informatics Practices Record
92 pages
SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
No ratings yet
SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
527 pages
Python Data Analysis with Numpy & Pandas
No ratings yet
Python Data Analysis with Numpy & Pandas
19 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
1-10: Multiple Choice Questions (MCQS) : NP - Array ( ( (1,2), (3,4) ) )
No ratings yet
1-10: Multiple Choice Questions (MCQS) : NP - Array ( ( (1,2), (3,4) ) )
5 pages
Excel With Ai A Comprehensive Guide To Copilot and Python Integration Publishing Download
100% (1)
Excel With Ai A Comprehensive Guide To Copilot and Python Integration Publishing Download
82 pages
Multinomial Distribution
No ratings yet
Multinomial Distribution
1 page
COMPUTER APPLICATIONS - Programming With Python
No ratings yet
COMPUTER APPLICATIONS - Programming With Python
4 pages
Data Science Exam Instructions and Questions
No ratings yet
Data Science Exam Instructions and Questions
8 pages
Python Assignment 1.ipynb - Colaboratory
No ratings yet
Python Assignment 1.ipynb - Colaboratory
3 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Pranav Dharmarajan E: Personal Information Profile
No ratings yet
Pranav Dharmarajan E: Personal Information Profile
1 page
12 Ai Practical File
100% (3)
12 Ai Practical File
5 pages

Python Data Science Cheat Sheet

Uploaded by

Python Data Science Cheat Sheet

Uploaded by

Python For Data Science Cheat Sheet Lists Also see NumPy Arrays Libraries

>>> a = 'is' Import libraries

0 2016-03-01 a 11.432 Type a b c Setting/Resetting Index >>> [Link](data1, X1 X2 X3

Iteration Missing Data

>>> [Link](a,b) Boolean Indexing

>>> a = [Link]([1,2,3]) [ 4. , 10. , 18. ]])

>>> np.string_ Fixed-length string type >>> [Link]() Sort an array

1 Prepare The Data Also see Lists & NumPy

1D Data 4 Customize Plot

>>> import [Link] as plt >>> [Link](ticks=range(1,5), Manually set x-ticks

3 Plotting Routines 5 Save Plot

Categorical Plots Regression Plots

2 Figure Aesthetics Also see Matplotlib

Spark SQL is Apache Spark's module for "age", and type

Rows Columns palette=['blue', 'red', 'green'])

Grid Layout Linked Plots Also see Data

1 Data Also see Lists, NumPy & Pandas

columns=['mpg','cyl', 'hp', 'origin'],

>>> from [Link] import Histogram

>>> p2 = figure(plot_width=300, plot_height=300, >>> show(p1) >>> save(p1)

You might also like