Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
fill_color='white')
Outside Plot Area
>>> p2.square(np.array([1.5,3.5,5.5]), [1,4,3],
r1 = p2.asterisk(np.array([1,2,3]), np.array([3,2,1])
pd.DataFrame([[3,4,5],[3,2,1]]),
Bokeh’s mid-level general purpose bokeh.plotting interface is centered Legend Background & Border
around two main components: data
and glyphs. Selection and Non-Selection Glyphs
>>> p = figure(tools='box_select')
>>> p.legend.border_line_color = "navy"
nonselection_alpha=0.1)
Rows & Columns Layout
Hover Glyphs
>>> from bokeh.models import HoverTool
Rows
>>> hover = HoverTool(tooltips=None, mode='vline')
>>> from bokeh.layouts import row
The basic steps to creating plots with the bokeh.plotting
interface are:
>>> p3.add_tools(hover) >>> layout = row(p1,p2,p3)
1. Prepare some data (Python lists, NumPy arrays, Pandas DataFrames and other sequences of values)
Columns
2. Create a new plot
>>> from bokeh.layouts import columns
color=dict(field='origin',
Grid Layout
>>> y = [6, 7, 2, 4, 5]
transform=color_mapper),
x_axis_label='x',
>>> row1 = [p1,p2]
y_axis_label='y')
>>> row2 = [p3]
>>>
>>>
>>>
from bokeh.models.widgets import Panel, Tabs
Linked Brushing
>>> from bokeh.io import output_file, show
>>> p4 = figure(plot_width = 100, tools='box_select,lasso_select')
data2,
Query
how='left',
on='X1')
data2,
on='X1')
columns='Type',
DataFrame columns={"Country":"cntry",
how='inner',
"Capital":"cptl",
on='X1')
"Population":"ppltn"})
>>> pd.merge(data1,
Reindexing data2,
how='outer',
on='X1')
Pivot Table >>> s2 = s.reindex(['a','c','d','e','b'])
columns values='Value',
>>> df.reindex(range(4),
>>> s3 = s.reindex(range(5),
index='Date',
method='ffill') method='bfill') Join
columns='Type']) Country Capital Population
0 3
np.array([5,4,3])]
Horizontal/Vertical
>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)
names=['first', 'second'])
> Dates
> Duplicate Data
>>> pd.melt(df2, #Gather columns into rows
id_vars=["Date"],
value_vars=["Type", "Value"],
>>> df2['Date']= pd.to_datetime(df2['Date'])
>>> df4.groupby(level=0).sum()
www.datacamp.com/courses/data-science-for-business www.datacamp.com/groups/business
Exploration and Visualization Experimentation and Prediction
The type of dashboard you should use depends on what you’ll be using it for. Machine Learning
Common Dashboard Elements Machine learning is an application of artificial intelligence (AI) that builds
algorithms and statistical models to train data to address specific questions
Type What is it best for? Example without explicit instructions.
Stacked bar chart Tracking composition over time Example Recommendation systems, email Image segmentation,
subject optimization, churn customer segmentation
prediction
Excel Power BI R Shiny Time Series Forecasting is a technique for predicting events through a
sequence of time and can capture seasonality or periodic events.
Sheets Tableau d3.js
Natural Language Processing (NLP) allows computers to process and analyze
Looker
large amounts of natural language data.
- Text as input data
- Word counts track the important words in a text
When You Should Request a Dashboard - Word embeddings create features that group similar words
When you’ll use it multiple times Deep Learning / Neural Networks enables Explainable AI is an emerging field in
unsupervised machine learning using data machine learning that applies AI such
that is unstructured or unlabeled. that results can be easily understood.
www.datacamp.com/courses/data-science-for-business www.datacamp.com/groups/business
> Model Architecture > Inspect Model
Sequential Model
Python For Data Science
>>> model.add(Dense(12,
metrics=['accuracy'])
input_dim=8,
Keras activation='relu'))
>>> model.add(Dense(8,kernel_initializer='uniform',activation='relu'))
>>> model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
>>> model.add(Dense(512,activation='relu',input_shape=(784,)))
loss='mse',
>>> model.add(Dense(32,
activation='relu',
Convolutional Neural Network (CNN) > Model Training
input_dim=100))
>>> model.compile(optimizer='rmsprop',
>>> model2.add(Conv2D(32,(3,3),padding='same',input_shape=x_train.shape[1:]))
y_train4,
loss='binary_crossentropy',
>>> model2.add(Activation('relu'))
batch_size=32,
metrics=['accuracy'])
>>> model2.add(Conv2D(32,(3,3)))
epochs=15,
>>> model.fit(data,labels,epochs=10,batch_size=32)
>>> model2.add(Activation('relu'))
verbose=1,
> Data
>>> model2.add(Activation('relu'))
>>>
>>>
model2.add(Conv2D(64,(3, 3)))
model2.add(Activation('relu'))
> Evaluate Your Model's Performance
>>> model2.add(MaxPooling2D(pool_size=(2,2)))
Your data needs to be stored as NumPy arrays or as a list of NumPy arrays. Ideally, you split the data in training and >>> model2.add(Dropout(0.25))
>>> score = model3.evaluate(x_test,
test sets, for which you can also resort to the train_test_split module of sklearn.cross_validation. >>> model2.add(Flatten())
y_test,
>>> model2.add(Dense(512))
batch_size=32)
>>> model2.add(Activation('relu'))
>>> model2.add(Dense(num_classes))
>>>
>>>
from keras.datasets import boston_housing, mnist,
cifar10, imdb
(x_train,y_train),(x_test,y_test) = mnist.load_data()
>>> model2.add(Activation('softmax'))
> Save/ Reload Models
>>>
>>>
(x_train2,y_train2),(x_test2,y_test2) = boston_housing.load_data()
(x_train3,y_train3),(x_test3,y_test3) = cifar10.load_data()
>>> model3.add(LSTM(128,dropout=0.2,recurrent_dropout=0.2))
>>> model2.compile(loss='categorical_crossentropy',
optimizer=opt,
Early Stopping
Sequence Padding Train and Test Sets
>>> from keras.callbacks import EarlyStopping
random_state=42) batch_size=32,
epochs=15,
Standardization/Normalization callbacks=[early_stopping_monitor])
>>> from keras.utils import to_categorical
>>> Y_test3 = to_categorical(y_test3, num_classes) >>> standardized_X_test = scaler.transform(x_test2) Learn Data Skills Online at www.DataCamp.com
> Plotting Routines > Plotting Cutomize Plot
>>> ax.scatter(x,y,marker=".")
>>> ax.plot(x,y,marker="o")
>>> y = np.cos(x)
>>> axes[0,1].streamplot(X,Y,U,V) #Plot a 2D field of arrows style='italic')
xy=(8, 0),
textcoords='data',
>>> U = -1 - X**2 + Y
Mathtext
>>> V = 1 + X - Y**2
>>>
>>>
from matplotlib.cbook import get_sample_data
img = np.load(get_sample_data('axes_grid/bivariate_normal.npy')) > Plot Anatomy & Workflow >>> plt.title(r'$sigma_i=15$', fontsize=20)
Legends
>>> fig = plt.figure()
>>> ax.set(title='An Example Axes', #Set a title and x-and y-axis labels
xlabel='X-Axis')
>>> fig.add_axes()
>>> x = [1,2,3,4] #Step 1
direction='inout',
>>> ax.scatter([2,4,6],
>>> fig3.subplots_adjust(wspace=0.5, #Adjust the spacing between subplots
[5,15,25],
hspace=0.3,
left=0.125,
marker='^')
right=0.9,
>>> ax1.spines['top'].set_visible(False) #Make the top axis line for a plot invisible
> Show Plot > Close and Clear >>> ax1.spines['bottom'].set_position(('outward',10)) #Move the bottom axis line outward
>>> plt.show()
>>> plt.clf() #Clear the entire figure
>>>
>>>
b.dtype.name #Name of data type
> Data Types >>> a[2] #Select the element at the 2nd index
1.5 2
2 3
3
6.0 4 5 6
>>> np.int64 #Signed 64-bit integer types
Numpy
>>> np.float32 #Standard double-precision floating point
Slicing
>>> np.complex #Complex numbers represented by 128 floats
>>> a[0:2] #Select items at index 0 and 1
1 2 3
>>> Numpy
np.bool #Boolean type storing TRUE and FALSE values
array([1, 2])
It provides a high-performance multidimensional array object, and tools for array([[1.5, 2., 3.]])
1.5 2 3
4 5 6
working with these arrays >>> c[1,...] #Same as [1,:,:]
>>> g = a - b #Subtraction
Fancy Indexing
array([[-0.5, 0. , 0. ],
array([ 4. , 2. , 6. , 1.5])
>>> b[[1, 0, 1, 0]][:,[0,1,2,0]] #Select a subset of the matrix’s rows and columns
>>> b + a #Addition
array([[ 4. ,5. , 6. , 4. ],
array([[ 2.5, 4. , 6. ],
[ 1.5, 2. , 3. , 1.5],
[ 5. , 7. , 9. ]])
[ 4. , 5. , 6. , 4. ],
[ 1.5, 2. , 3. , 1.5]])
>>> a / b #Division
array([[ 0.66666667, 1. , 1. ],
>>> a * b #Multiplication
> Array Manipulation
> Creating Arrays
array([[ 1.5, 4. , 9. ],
>>> a = np.array([1,2,3])
>>> np.sqrt(b) #Square root
>>> i.T #Permute array dimensions
>>> b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
>>> np.sin(a) #Print sines of an array
>>> c = np.array([[(1.5,2,3), (4,5,6)],[(3,2,1), (4,5,6)]], dtype = float) >>> np.cos(b) #Element-wise cosine
Changing Array Shape
>>> np.log(a) #Element-wise natural logarithm
>>> b.ravel() #Flatten the array
[ 4. , 5. , 6. ]])
>>> np.save('my_array', a)
>>> b.cumsum(axis=1) #Cumulative sum of the elements
[ 3, 20]])
>>> np.savez('array.npz', a, b)
>>> a.mean() #Mean
>>> np.c_[a,d] #Create stacked column-wise arrays
>>> np.load('my_array.npy') >>> np.median(b) #Median
[array([[[ 1.5, 2. , 1. ],
>>> np.loadtxt("myfile.txt")
>>> df.to_csv('myDataFrame.csv')
>>>
>>>
>>>
df.shape #(rows,columns)
Learn Pandas Basics online at www.DataCamp.com Read and Write to Excel >>> df.count() #Number of non-NA values
>>> pd.read_excel(‘file.xlsx’)
Pandas
>>>
>>> df.cumsum() #Cummulative sum of values
Use the following import convention: >>> from sqlalchemy import create_engine
pd.read_sql_table('my_table', engine)
> Applying Functions
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
read_sql() is a convenience wrapper around read_sql_table() and
read_sql_query() >>> f = lambda x: x*2
> Pandas Data Structures >>> df.to_sql('myDf', engine) >>> df.apply(f) #Apply function
Series
> Selection Also see NumPy Arrays
> Data Alignment
A one-dimensional labeled array
a 3
capable of holding any data type b -5 Getting Internal Data Alignment
Index
c 7 >>> s['b'] #Get one element
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> df[1:] #Get subset of a DataFrame
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
c 5.0
You can also do the internal data alignment yourself with
the help of the fill methods:
Index 1 India New Delhi 1303171035 >>> df.iat([0],[0])
>>> s.add(s3, fill_values=0)
'Belgium' a 10.0
By Label
>>> data = {'Country': ['Belgium', 'India', 'Brazil'],
c 5.0
>>> df = pd.DataFrame(data,
>>> df.at([0], ['Country'])
>>> s.div(s3, fill_value=4)
By Label/Position
> Dropping
>>> df.ix[2] #Select single row of subset of rows
Country Brazil
Capital Brasília
Population 207847528
1 New Delhi
2 Brasília
Boolean Indexing
>>> help(pd.Series.loc) >>> s[~(s > 1)] #Series s where value is not >1
>>> s[(s < -1) | (s > 2)] #s where value is <-1 or >2
>>> my_string
>>>
>>>
a = 'is'
b = 'nice'
String Operations
Selecting List Elements Index starts at 0
Learn Python Basics online at www.DataCamp.com >>> my_string * 2
'thisStringIsAwesomethisStringIsAwesome'
Subset
>>> my_string + 'Innit'
'thisStringIsAwesomeInnit'
>>> my_list[1]
#Select item at index 1
>>> x=5
>>> my_list2[1][:2]
>>> x
5
String Methods
>>> my_string.upper()
#String to uppercase
List Operations
Calculations With Variables >>>
>>>
my_string.lower()
#String to lowercase
True
>>> x*2 #Multiplication of two variables
10
25
1
>>> my_list.index(a) #Get the index of an item
Types and Type Conversion Selecting Numpy Array Elements Index starts at 0 >>>
>>>
del(my_list[0:1])
#Remove an item
str()
>>> my_list.insert(0,'!')
#Insert an item
'5', '3.45', 'True' #Variables to strings >>> my_array[1] #Select item at index 1
>>> my_list.sort() #Sort the list
2
int()
Slice
5, 3, 1 #Variables to integers
>>> my_array[0:2]
#Select items at index 0 and 1
float()
5.0, 1.0 #Variables to floats
array([1, 2])
array([1, 4])
True, True, True #Variables to booleans
Leading open data science
Free IDE that is included
Create and share
Numpy Array Operations platform powered by Python with Anaconda documents with live code
>>> my_array * 2
array([2, 4, 6, 8])
Data analysis Scientific computing 2D plotting Machine learning array([6, 8, 10, 12]) >>> help(str)
1
Boxplot yticks=[0,2.5,5])
Data Also see Lists, NumPy & Pandas >>> sns.boxplot(x="alive", Boxplot
Plot
y="age",
>>> import pandas as pd hue="adult_male",
>>> import numpy as np >>> plt.title("A Title") Add plot title
data=titanic)
>>> uniform_data = np.random.rand(10, 12) >>> plt.ylabel("Survived") Adjust the label of the y-axis
>>> sns.boxplot(data=iris,orient="h") Boxplot with wide-form data
>>> data = pd.DataFrame({'x':np.arange(1,101), >>> plt.xlabel("Sex") Adjust the label of the x-axis
'y':np.random.normal(0,4,100)}) Violinplot >>> plt.ylim(0,100) Adjust the limits of the y-axis
>>> sns.violinplot(x="age", Violin plot >>> plt.xlim(0,10) Adjust the limits of the x-axis
Seaborn also offers built-in data sets: y="sex", >>> plt.setp(ax,yticks=[0,5]) Adjust a plot property
>>> titanic = sns.load_dataset("titanic") hue="survived", >>> plt.tight_layout() Adjust subplot params
>>> iris = sns.load_dataset("iris") data=titanic)
Python lists, NumPy arrays, Pandas DataFrames and other sequences of values
2. Create a new plot
>>> color_mapper = CategoricalColorMapper(
factors=['US', 'Asia', 'Europe'],
palette=['blue', 'red', 'green'])
4 Output & Export
3. Add renderers for your data, with visual customizations >>> p3.circle('mpg', 'cyl', source=cds_df, Notebook
color=dict(field='origin',
4. Specify where to generate the output transform=color_mapper), >>> from bokeh.io import output_notebook, show
5. Show or save the results legend='Origin') >>> output_notebook()
>>> from bokeh.plotting import figure
>>> from bokeh.io import output_file, show Legend Location HTML
>>> x = [1, 2, 3, 4, 5] Step 1
>>> y = [6, 7, 2, 4, 5] Inside Plot Area Standalone HTML
>>> p = figure(title="simple line example", Step 2 >>> p.legend.location = 'bottom_left' >>> from bokeh.embed import file_html
>>> from bokeh.resources import CDN
x_axis_label='x',
>>> html = file_html(p, CDN, "my_plot")
y_axis_label='y') Outside Plot Area
>>> p.line(x, y, legend="Temp.", line_width=2) Step 3 >>> from bokeh.models import Legend
>>> r1 = p2.asterisk(np.array([1,2,3]), np.array([3,2,1]) >>> from bokeh.io import output_file, show
>>> output_file("lines.html") Step 4 >>> r2 = p2.line([1,2,3,4], [3,4,5,6]) >>> output_file('my_bar_chart.html', mode='cdn')
>>> show(p) Step 5 >>> legend = Legend(items=[("One" ,[p1, r1]),("Two",[r2])],
location=(0, -30)) Components
1 Data Also see Lists, NumPy & Pandas
>>> p.add_layout(legend, 'right')
Legend Orientation
>>> from bokeh.embed import components
>>> script, div = components(p)
Under the hood, your data is converted to Column Data
Sources. You can also do this manually: >>> p.legend.orientation = "horizontal" PNG
>>> import numpy as np >>> p.legend.orientation = "vertical"
>>> from bokeh.io import export_png
>>> import pandas as pd >>> export_png(p, filename="plot.png")
>>> df = pd.DataFrame(np.array([[33.9,4,65, 'US'], Legend Background & Border
[32.4,4,66, 'Asia'],
[21.4,4,109, 'Europe']]), >>> p.legend.border_line_color = "navy" SVG
columns=['mpg','cyl', 'hp', 'origin'], >>> p.legend.background_fill_color = "white"
index=['Toyota', 'Fiat', 'Volvo']) >>> from bokeh.io import export_svgs
>>> from bokeh.models import ColumnDataSource Rows & Columns Layout >>> p.output_backend = "svg"
>>> export_svgs(p, filename="plot.svg")
>>> cds_df = ColumnDataSource(df) Rows
>>> from bokeh.layouts import row
>>> A = np.matrix(np.random.random((2,2)))
Division
>>> B = np.asmatrix(b)
>>> np.divide(A,D) #Division
>>> C = np.mat(np.random.random((10,5)))
NumPy extension of Python. >>> np.trace(A) #Trace >>> linalg.expm2(A) #Matrix exponential (Taylor Series)
> Interacting With NumPy Also see NumPy >>> linalg.norm(A,1) #L1 norm (max column sum)
>>> a = np.array([1,2,3])
>>> np.linalg.matrix_rank(C) #Matrix rank >>> linalg.cosm(D) Matrix cosine
>>>
>>>
b.flatten() #Flatten the array
>>>
>>>
F = np.eye(3, k=1) #Create a 2X2 identity matrix
>>> p = poly1d([3,4,5]) #Create a polynomial object Sparse Matrix Routines Singular Value Decomposition
>>> U,s,Vh = linalg.svd(B) #Singular Value Decomposition (SVD)
Vectorizing Functions >>> sparse.linalg.inv(I) #Inverse >>> Sig = linalg.diagsvd(s,M,N) #Construct sigma matrix in SVD
Norm LU Decomposition
>>> def myfunc(a): if a < 0:
>>> P,L,U = linalg.lu(C) #LU Decomposition
return a*2
>>> sparse.linalg.norm(I) #Norm
else:
Solving linear problems
return a/2
>>>
>>>
np.real_if_close(c,tol=1000) #Return a real array if complex parts close to 0
>>>
>>>
g = np.linspace(0,np.pi,num=5) #Create an array of evenly spaced values(number of samples)
g [3:] += np.pi
> Asking For Help Learn Data Skills Online at
>>>
>>>
np.unwrap(g) #Unwrap
www.DataCamp.com
>>> np.select([c<4],[c*2]) #Return values from a list of arrays depending on
conditions
>>> help(scipy.linalg.diagsvd)
row="sex")
>>> g = g.map(plt.hist,"age")
y="sepal_length",
data=iris,
ax=ax)
>>> sns.factorplot(x="pclass", #Draw a categorical plot onto a
Facetgrid
y="survived",
>>> sns.lmplot(x="sepal_width", #Plot data and regression model fits across a FacetGrid
y="sepal_length",
>>> plot = sns.distplot(data.y, #Plot univariate distribution
hue="species",
kde=False,
data=iris)
color="b")
>>> h = sns.PairGrid(iris) #Subplot grid for plotting pairwise
relationships
y="y",
The Python visualization library Seaborn is based on matplotlib and provides data=data)
Categorical Plots
"sepal_width",
y="petal_length",
data=iris)
Bar Chart
4. Further customize your plot
Axisgrid Objects >>> sns.barplot(x="sex", #Show point estimates & confidence intervals with scatterplot glyphs
hue="class",
y="total_bill",
>>> h.set(xlim=(0,5), #Set the limit and ticks of the
x-and y-axis
palette="Greens_d")
data=tips,
ylim=(0,5),
aspect=2)
xticks=[0,2.5,5],
Point Plot
>>> g = (g.set_axis_labels("Tip","Total bill(USD)").
yticks=[0,2.5,5])
>>> sns.pointplot(x="class", #Show point estimates &
confidence intervals as
rectangular bars
set(xlim=(0,10),ylim=(0,100)))
y="survived",
data=titanic,
palette={"male":"g",
Boxplot
linestyles=["-","--"])
data=titanic)
2 Figure Aesthetics Also see Matplotlib 5 Show or Save Plot Also see Matplotlib
transparent=True)
"ytick.major.size":8})
Color Palette
#Return a dict of params or use with with to temporarily set the style
>>> plt.cla() #Clear an axis
scaler = StandardScaler().fit(X_train)
standardized_X = scaler.transform(X_train)
Accuracy Score
>>> knn.score(X_test, y_test)
#Estimator score method
Confusion Matrix
>>> scaler = Normalizer().fit(X_train)
Scikit-learn >>>
>>>
normalized_X = scaler.transform(X_train)
normalized_X_test = scaler.transform(X_test)
>>> from sklearn.metrics import confusion_matrix
Scikit-learn is an open source Python library that
implements a range of Binarization Regression Metrics
machine learning,
preprocessing, cross-validation and visualization
X_test = scaler.transform(X_test)
Clustering Metrics
>>> knn.fit(X_train, y_train)
Homogeneity
>>> lr = LinearRegression(normalize=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X,
y,
random_state=0)
Support Vector Machines (SVM)
>>> from sklearn.svm import SVC
Grid Search
> Model Fitting
Naive Bayes
>>> from sklearn.naive_bayes import GaussianNB
>>> print(grid.best_score_)
Unsupervised Learning
Unsupervised Learning Estimators >>> print(grid.best_estimator_.n_neighbors)
>>> k_means.fit(X_train) #Fit the model to the data
>>> print(rsearch.best_score_)
>>> y_pred = svc.predict(np.random.random((2,5)))
#Predict labels
use displacy.serve to start a web server and show the visualization in your browser.
spaCy Cheat Sheet Span indices are exclusive. So doc[2:4] is a span starting at
token 2, up to – but not including! – token 4.
>>> doc = nlp("This is a text")
text'
>>> doc = nlp("This is a sentence")
spaCy >>> from spacy.tokens import Span #Import the Span object
>>> span = Span(doc, 3, 5, label="GPE") #Span for "New York" with label GPE (geopolitical)
>>> span.text
spaCy is a free, open-source library for advanced Natural Language 'New York’
processing (NLP) in Python. It's designed
specifically for production use and Visualize named entities
helps you build
applications that process and "understand" large volumes
>>> doc = nlp("Larry Page founded Google")
>>> import spacy Attributes return label IDs. For string labels, use the attributes with an underscore. For example, token.pos_ .
and more. See here for available models:
spacy.io/models
Comparing similarity
>>> $ python -m spacy download en_core_web_sm
Syntactic dependencies Predicted by Statistical model
>>> doc1 = nlp("I like cats")
Check that your installed models are up to date >>> doc = nlp("This is a text.")
>>>
>>>
doc1.similarity(doc2)
#Compare 2 documents
>>> nlp = spacy.load("en_core_web_sm") # Load the installed model "en_core_web_sm"
>>> doc = nlp("Larry Page founded Google")
>>> doc[2].vector_norm
>>> [(ent.text, ent.label_) for ent in doc.ents]
#Text and label of named entity span
information about the tokens, their linguistic features and their relationships >>> [sent.text for sent in doc.sents] #doc.sents is a generator that yields sentence spans
Accessing token attributes Base noun phrases Needs the tagger and parser
>>> nlp.pipeline
[('tagger', <spacy.pipeline.Tagger>),
('ner', <spacy.pipeline.EntityRecognizer>)]
>>> spacy.explain("RB")
'adverb'
Custom components
>>> spacy.explain("GPE")
Learn Data Skills Online at
'Countries, cities, states' def custom_component(doc):
#Function that modifies the doc and returns it
Attribute extensions With default value # Each dict represents one token and its attributes
Property extensions With getter and setter >>> for match_id, start, end in matches:
Sentence Boundary Detection
# Get the matched span by slicing the Doc
span = doc[start:end]
Method extensions Callable Method >>> pattern1 = [{"LEMMA": "love"}, {"LOWER": "cats"}]
Dependency Parsing
# Register custom attribute on Span class
# "book", "a cat", "the sea" (noun + optional article)
>>> doc[3:5].has_label("GPE")
True
Operators and quantifiers tokens, like subject or object.
Can be added to a token dict as the "OP" key
Named Entity Recognition (NER)
! Negate pattern and match exactly 0 times
Statistical model
Process for making predictions based on
examples.
Training
Updating a statistical model with new examples.
Version 2.1 Get the latest version at www.altoros.com/visuals Order private training at www.altoros.com/training
TensorFlow v2.0 Cheat Sheet
A Reference Machine Learning Workflow tf.data.Dataset represents a sequence of elements each containing
one or more Tensor object(-s). This can be exemplified by a pair of
Here’s a conceptual diagram and a workflow example:
tensors representing an image and a corresponding class label.
import tensorflow as tf
DATASET_URL = “https://archive.ics.uci.edu/ml/machine-” \
“learning-databases/covtype/covtype.data.gz”
DATASET_SIZE = 387698
dataset_path = tf.keras.utils.get_file(
fname=DATASET_URL.split(’/’)[-1], origin=DATASET_URL)
COLUMN_NAMES = [
’Elevation’, ’Aspect’, ’Slope’,
’Horizontal_Distance_To_Hydrology’,
’Vertical_Distance_To_Hydrology’,
’Horizontal_Distance_To_Roadways’,
’Hillshade_9am’, ’Hillshade_Noon’, ’Hillshade_3pm’,
’Horizontal_Distance_To_Fire_Points’, ’Soil_Type’,
’Cover_Type’]
def _parse_line(line):
Version 2.1 Get the latest version at www.altoros.com/visuals Order private training at www.altoros.com/training
TensorFlow v2.0 Cheat Sheet
# Build, train, and evaluate the estimator
model = tf.estimator.LinearClassifier(feature_columns,
n_classes=4)
model.train(input_fn=lambda: csv_input_fn(dataset_path),
steps=10000)
model.ev aluate(
input_fn=lambda: csv_input_fn(dataset_path, test=True))
serving_input_fn = _builder(_spec_maker(feature_columns))
export_path = model.export_saved_model(
“/tmp/from_estimator/”, serving_input_fn)
The following code sample shows how to load and use the
saved model with Python.
# Import model from SavedModel
imported = tf.saved_model.load(export_path)
Version 2.1 Get the latest version at www.altoros.com/visuals Order private training at www.altoros.com/training