0% found this document useful (0 votes)
59 views234 pages

Production ML Pipelines With TensorFlow Extended - TFX - Presentation

Uploaded by

ku.madan05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views234 pages

Production ML Pipelines With TensorFlow Extended - TFX - Presentation

Uploaded by

ku.madan05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 234

TFX Tutorial

Developing Production ML Pipelines

Aurélien Géron
Consultant
@aureliengeron 1
What are we doing here?
What does it all mean?

2
In addition to training an amazing model ...

Modeling Code
… a production solution requires so much more

Data Verification Monitoring

Configuration

Data Collection
Modeling Analysis Tools
Code

Serving Infrastructure
Process Management Tools Machine Resource
Management
Feature Extraction
Production Machine Learning
“Hidden Technical Debt in Machine Learning Systems”
NIPS 2015
http://bit.ly/ml-techdebt
Production Machine Learning
Machine Learning Development

● Labeled data
● Feature space coverage
● Minimal dimensionality
● Maximum predictive data
● Fairness
● Rare conditions
● Data lifecycle management
Production Machine Learning
Machine Learning Development Modern Software Development

● Labeled data ● Scalability

+
● Feature space coverage ● Extensibility
● Minimal dimensionality ● Configuration
● Maximum predictive data ● Consistency & Reproducibility
● Fairness ● Modularity
● Rare conditions ● Best Practices
● Data lifecycle management ● Testability
● Monitoring
● Safety & Security
TensorFlow Extended (TFX)
TensorFlow Extended (TFX)
Powers Alphabet’s most important bets and products
… and some of Google’s most important partners.

“... we have re-tooled our machine learning platform to


use TensorFlow. This yielded significant productivity
gains while positioning ourselves to take advantage of
the latest industry research.”
Ranking Tweets with TensorFlow - Twitter blog post
When to use TFX
We’re learning how to create an ML pipeline using TFX
○ TFX pipelines are appropriate when:
■ datasets are large, or may someday get large
■ training/serving consistency is important
■ version management for inference is important

○ Google uses TFX pipelines for everything from single-node to large-scale


ML training and inference

11
What we’re doing
We’re following a typical ML development process
● Understanding our data
● Feature engineering
● Training
● Analyze model performance
● Lather, rinse, repeat
● Ready to deploy to production
12
Data ingestion
Data ingestion Data validation

age is missing

country not in:


● China
● India
● USA
Data ingestion Data validation Data transform

age normalized_age

China ➜ [1, 0, 0]
India ➜ [0, 1, 0]
USA ➜ [0, 0, 1]
Data ingestion Data validation Data transform

Model training
Data ingestion Data validation Data transform

Model training Model analysis

?
Data ingestion Data validation Data transform

Model training Model analysis

Accuracy

Model
version
Data ingestion Data validation Data transform

Model training Model analysis

Accuracy

new

Model
version
Data ingestion Data validation Data transform

Model training Model analysis Model push


Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push

Requests
Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push

Predictions
Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push


Data ingestion Data validation Data transform

Model training Model analysis Model push


TF Data Validation (TFDV)

Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

TF Model Analysis (TFMA)

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

TF Model Analysis (TFMA)

Model serving

TF Serving (TFS)
TFDV TFT TFMA TFS
(or , or...)

TFDV TFT TFMA TFS


(or , or...)

TFDV TFT TFMA TFS

ML Metadata Store (MLMD)

(or , or...) SQL


(or , or...)

TFDV TFT TFMA TFS

ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
(or , or...)

TFDV TFT TFMA TFS

TFX ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
(or , or...)

TFDV TFT TFMA TFS

TFX ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
GoogleCloud
or or Dataflow or...
Orchestration: ...

Metadata Store: ...

Processing API:

Google Cloud
Beam Runner: Dataflow ...
Orchestration: Manual w/ InteractiveContext

Metadata Store:

Processing API:

Beam Runner:
Local Runner
Exercise 1
Prerequisites
● Linux / MacOS
● Python 3.6
● Virtualenv
● Git

42
Step 1: Setup your environment
% sudo apt-get update

% sudo apt-get install -y python3-dev python3-pip virtualenv

% sudo apt-get install -y build-essential libssl-dev libffi-dev

% sudo apt-get install -y libxml2-dev libxslt1-dev zlib1g-dev

% sudo apt-get install -y git-core

43
Step 1: Setup your environment
% cd

% virtualenv -p python3.6 tfx_env

% source tfx_env/bin/activate

(tfx_env) mkdir tfx; cd tfx

(tfx_env) pip install tensorflow==2.0.0

(tfx_env) pip install tfx==0.14.0

44
TFX End-to-End Example

Predicting Online News Popularity


45
TFX End-to-End Example

Online News Popularity Dataset

Features

Categorical Features Date Features Text Features Numerical Features

data_channel (Lifestyle, Tech…) publication_date slug (e.g., snow-dogs) n_unique_tokens

weekday (Monday, Tuesday…) n_hrefs

n_imgs

global_subjectivity

kw_avg_max (best kw avg shares)

self_reference_avg_shares

global_sentiment_polarity
Label = n_shares
...
46
Parses Transforms Expects Label

train_input_fn No No Yes
Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes
Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No


Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No

receiver_fn (TFMA) Yes Yes Yes


Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No

receiver_fn (TFMA) Yes Yes Yes

receiver_fn (TFMA) must return both the raw


features and the transformed features
Lab 1

Running a simple TFX pipeline


manually in a Colab Notebook
52
ML Coding vs ML Engineering

53
ML Coding vs ML engineering

Machine Resource
Data Verification
Management
Data Collection

Serving
Configuration Monitoring
Infrastructure
ML Code Analysis Tools

Feature Extraction Process Management Tools

Adapted from: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
Writing Software (Programming)

Programming in the small (Coding) Programming in the large (Engineering)

Monolithic code Modular design and implementation

Non-reusable code Libraries for reuse (ideally across languages)

Undocumented code Well documented contracts and abstractions

Untested code Well tested code (exhaustively and at scale)

Unbenchmarked or hack-optimized once code Continuously benchmarked and optimized code

Unverified code Reviewed and peer verified code

Undebuggable code or adhoc tooling Debuggable code and debug tooling

Uninstrumented code Instrumentable and instrumented code

... ...
Writing ML Software (The “Code” view)

ML Programming in the small (Coding) ML Programming in the large (Engineering)

Monolithic code Modular design and implementation

Non-reusable code Libraries for reuse (ideally across languages)

Undocumented code Well documented contracts and abstractions

Untested code Well tested code (exhaustively and at scale)

Unbenchmarked or hack-optimized once code Continuously benchmarked and optimized code

Unverified code Reviewed and peer verified code

Undebuggable code or adhoc tooling Debuggable code and debug tooling

Uninstrumented code Instrumentable and instrumented code

... ...
This slide is, not surprisingly, the same as the previous one however it is only half the story :)
Engineering

// Strong Contracts.
Output Program(Inputs) {
... Human authored and peer reviewed code ...
}

// Exhaustive testing of Contracts and Performance.


TestProgramCommonCase1...N {
...
}

TestProgramEdgeCase1...N() {
EXPECT_EQ(..., Program(...))
}

BenchmarkProgramWorstCase1...N {
...
}
Engineering vs ML Engineering

// Subjective representations and unclear objectives


LearnedProgram Learning(ProblemStatement, Data) {
... Human authored and peer reviewed ML pipelines ...
}

// Strong Contracts. // Unclear Contracts


Output Program(Inputs) { Output LearnedProgram(Inputs) {
... Human authored and peer reviewed code ... ... Opaque “Program” (aka Model) ...
} }

// Exhaustive testing of Contracts and Performance. // Peer Reviewing of ProblemStatement


TestProgramCommonCase1...N {
... // Data Validation
} Expectations for Data on
● Shape, Invariants, Distribution(s), ...
TestProgramEdgeCase1...N() {
EXPECT_EQ(..., Program(...)) // Model Validation
} Expectations for LearnedProgram “average” behavior on:
● “Metrics” {Quality, Fairness, Perf, ...}
BenchmarkProgramWorstCase1...N { Cross Product
... ● “Data Slices” {Global, UserChosen, AutoChosen,
} ...}
Writing ML Software (The “Data and other Artifacts” view)

ML Programming in the small (Coding) ML Programming in the large (Engineering)

Monolithic code Fixed Datasets Evolving Datasets (Data, Features, ...) and Objectives

Non-reusable code Unmergeable Artifacts Reusable Models aka Modules, Mergeable Statistics ...

Undocumented code No Problem Statements Problem Statements, Discoverable Artifacts

Untested code Non-validated Datasets, Models Expectations, Data Validation, Model Validation ...

Unbenchmarked or hack-optimized once code Models Quality and Performance Benchmarked Models ...

Unverified code Biased Datasets / Artifacts {Data, Model} x {Understanding, Fairness}

Undebuggable code or adhoc tooling Visualizations, Summarizations, Understanding ...

Uninstrumented code Full Artifact Lineage

... ...
This is the remaining half!
Introduction to Apache Beam

60
What is Apache Beam?

- A unified batch and stream distributed processing API


- A set of SDK frontends: Java, Python, Go, Scala, SQL
- A set of Runners which can execute Beam jobs into various
backends: Local, Apache Flink, Apache Spark, Apache
Gearpump, Apache Samza, Apache Hadoop, Google Cloud
Dataflow, …
Apache Beam Cloud Dataflow

Java Apache Flink


input.apply(
Sum.integersPerKey()) Apache Spark

Python
Apache Apex
input | Sum.PerKey()

Sum Per Key Gearpump


Go
stats.Sum(s, input)
IBM Streams

SQL
Apache Samza
SELECT key, SUM(value)
FROM input GROUP BY key
Apache Nemo
(incubating)


Beam Portability Framework
● Currently most runners support the Java SDK only
● Portability framework (https://beam.apache.org/roadmap/portability/) aims to
provide full interoperability across the Beam ecosystem
● Portability API
○ Protobufs and gRPC for broad language support
○ Job submission and management: The Runner API
○ Job execution: The SDK harness
● Python Flink and Spark runners use Portability Framework
Beam Portability Support Matrix
Hello World Example
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
Hello World Example
with beam.Pipeline() as pipeline:

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))


Concepts
● Pipeline
● PCollection
● PTransform
● I/O transforms
Pipeline
● A Pipeline encapsulates your entire data processing
task
● This includes reading input data, transforming that
data, and writing output data.
● All Beam driver programs must create a Pipeline.
● You can specify the execution options when
creating the Pipeline to tell it where and how to run.
Pipeline
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
PCollection
● A distributed dataset your Beam pipeline operates on.
● The dataset can be bounded (from fixed source) or
unbounded (from a continuously updating source).
● The pipeline typically creates a source PCollection by
reading data from an external data source
○ But you can also create a PCollection from in-memory data within your driver program.

● From there, PCollections are the inputs and outputs


for each step in your pipeline.
PCollection
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
PTransform
● A PTransform represents a data processing operation,
or a step, in your pipeline.
● Every PTransform takes one or more PCollection
objects as input
● It performs a processing function that you provide on
the elements of that PCollection.
● It produces zero or more output PCollection objects.
PTransform
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run() output

Hello
result.state
World
!!!
I/O Transforms
● Beam comes with a number of “IOs” library
PTransforms.
● They read or write data to various external storage
systems.
I/O Transforms
with beam.Pipeline() as pipeline:
lines = (pipeline
| beam.io.ReadFromTFRecord("test_in.tfrecord")
| beam.Map(lambda line: line + b' processed')
| beam.io.WriteToTFRecord("test_out.tfrecord"))
Lab 2

Introduction to Apache Beam


76
TFX’s Beam Orchestrator

77
Orchestration:
Orchestrator

Metadata Store:

Processing API:

Beam Runner:
Local Runner
Exercise 3
from tfx.orchestration import pipeline

pipeline.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
components=[
example_gen, statistics_gen, infer_schema, validate_stats,
transform, trainer, model_analyzer, model_validator, pusher
],
enable_cache=True,
metadata_connection_config=sqlite_metadata_connection_config(
metadata_path),
additional_pipeline_args={},
)
Lab 3

On-Prem with Beam Orchestrator


81
TensorFlow Data Validation

82
Data Exploration & Cleanup
The first task in any data science or ML project is to understand and clean the data

● Understand the data types for each feature


● Look for anomalies and missing values
● Understand the distributions for each feature

83
import tensorflow_data_validation as tfdv

train_stats = tfdv.generate_statistics_from_csv(
data_location=_train_data_filepath)

tfdv.visualize_statistics(train_stats)
85
tfdv.visualize_statistics(
lhs_statistics=eval_stats,
rhs_statistics=train_stats,
lhs_name='EVAL_DATASET',
rhs_name='TRAIN_DATASET')
schema = tfdv.infer_schema(statistics=train_stats)
tfdv.display_schema(schema=schema)
anomalies = tfdv.validate_statistics(
statistics=eval_stats,
schema=schema)

tfdv.display_anomalies(anomalies)
# Relax the minimum fraction of values that must come from
# the domain for feature company.
company = tfdv.get_feature(schema, 'company')
company.distribution_constraints.min_domain_mass = 0.9

# Add new value to the domain of feature payment_type.


payment_type = tfdv.get_domain(schema, 'payment_type')
payment_type.value.append('bitcoin')
# All features are by default in both TRAINING
# and SERVING environments.
schema.default_environment.append('TRAINING')
schema.default_environment.append('SERVING')

# Specify that 'tips' feature is not in SERVING


# environment.
n_shares_feature = tfdv.get_feature(schema, 'n_shares')
n_shares_feature.not_in_environment.append('SERVING')

serving_anomalies_with_env = tfdv.validate_statistics(
serving_stats, schema, environment='SERVING')
# Add skew comparator for 'weekday' feature.
weekday = tfdv.get_feature(schema, 'weekday')
weekday.skew_comparator.infinity_norm.threshold = 0.01

# Add drift comparator for 'title_subjectivity' feature.


title_subjectivity = tfdv.get_feature(schema, 'title_subjectivity')
title_subjectivity.drift_comparator.infinity_norm.threshold = 0.001

skew_anomalies = tfdv.validate_statistics(
train_stats,
schema,
previous_statistics=eval_stats,
serving_statistics=serving_stats)
Lab 4

TensorFlow Data Validation (TFDV)


93
TensorFlow Transform

94
Data Preprocessing
The raw data usually needs to be prepared before being fed to a Machine
Learning model. This may involve several transformations:

● Fill in missing values ● Feature crosses


● Normalize features ● Vocabularies
● Bucketize features ● Embeddings
● Zoom/Crop images ● PCA
● Augment images ● Categorical encoding

95
Training/Serving Skew
● Preprocessing data before training
● Same preprocessing required at serving time
● Possibly with multiple serving environments
● Risk of discrepancy
In-model preprocessing
● If we include the preprocessing steps in the TensorFlow graph, the
problem is solved
● Except training is slow
○ Preprocessing runs once per epoch instead of just once
98
Training Serving

99
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))

{
'_schema': feature {
name: "name"
type: BYTES
presence {
min_fraction: 1.0
}
shape {}
}
}
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'

decoded = data_coder.decode(encoded)
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'

decoded = data_coder.decode(encoded)

{'name': b'caf\xc3\xa9'}
tmp_dir = tempfile.mkdtemp(prefix="tft-data")
train_path = os.path.join(tmp_dir, "train.tfrecord")

with beam.Pipeline() as pipeline:


_ = (pipeline
| "Create" >> beam.Create(["Alice", "Bob", "Cathy", "Alice"])
| "ToDict" >> beam.Map(lambda name: {"name": name})
| "Encode" >> beam.Map(data_coder.encode)
| "Write" >> beam.io.WriteToTFRecord(train_path)
)
tmp_dir = tempfile.mkdtemp(prefix="tft-data")
train_path = os.path.join(tmp_dir, "train.tfrecord")

with beam.Pipeline() as pipeline:


_ = (pipeline
| "Create" >> beam.Create(["Alice", "Bob", "Cathy", "Alice"])
| "ToDict" >> beam.Map(lambda name: {"name": name})
| "Encode" >> beam.Map(data_coder.encode)
| "Write" >> beam.io.WriteToTFRecord(train_path)
)

/tmp/tft-datac1z2ichz/train.tfrecord-00000-of-00001
eval_path = os.path.join(tmp_dir, "eval.tfrecord")

with beam.Pipeline() as pipeline:


_ = (pipeline
| "Create" >> beam.Create(["Denis", "Alice"])
| "ToDict" >> beam.Map(lambda name: {"name": name})
| "Encode" >> beam.Map(data_coder.encode)
| "Write" >> beam.io.WriteToTFRecord(eval_path)
)

/tmp/tft-datac1z2ichz/eval.tfrecord-00000-of-00001
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)

{'name': b'Alice'}
{'name': b'Bob'}
{'name': b'Cathy'}
{'name': b'Alice'}
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)

{'name': b'Denis'}
{'name': b'Alice'}
def preprocessing_fn(inputs):
outputs = {}
lower = tf.strings.lower(inputs["name"])
outputs["name_xf"] = tft.compute_and_apply_vocabulary(lower)
return outputs
https://www.tensorflow.org/tfx/transform/api_docs ➔ Math
◆ covariance()
➔ Buckets ◆ max()
◆ apply_buckets() ◆ mean()
◆ apply_buckets_with_interpolation() ◆ min()
◆ bucketize() ◆ pca()
◆ bucketize_per_key() ◆ quantiles()
◆ scale_by_min_max()
➔ Text & Categories
◆ scale_by_min_max_per_key()
◆ apply_vocabulary()
◆ scale_to_0_1()
◆ bag_of_words()
◆ scale_to_0_1_per_key()
◆ compute_and_apply_vocabulary()
◆ scale_to_z_score()
◆ hash_strings()
◆ scale_to_z_score_per_key()
◆ ngrams()
◆ size()
◆ vocabulary()
◆ sum()
◆ word_count()
◆ var()
◆ tfidf()
➔ Misc
➔ Apply arbitrary transformations
◆ deduplicate_tensor_per_row()
◆ apply_function_with_checkpoint()
◆ get_analyze_input_columns()
◆ apply_pyfunc()
◆ get_transform_input_columns()
◆ apply_saved_model()
◆ segment_indices()
◆ ptransform_analyzer()
◆ sparse_tensor_to_dense_with_shape()
with beam.Pipeline() as pipeline:

train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)
with beam.Pipeline() as pipeline:

train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)

train_dataset = (train_data, RAW_DATA_METADATA)


train_dataset_xf, transform_fn = (train_dataset
| tft.beam.AnalyzeAndTransformDataset(preprocessing_fn))
train_data_xf, metadata_xf = train_dataset_xf
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)

train_dataset = (train_data, RAW_DATA_METADATA)


train_dataset_xf, transform_fn = (train_dataset
| tft.beam.AnalyzeAndTransformDataset(preprocessing_fn))
train_data_xf, metadata_xf = train_dataset_xf
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)

train_dataset = (train_data, RAW_DATA_METADATA)


train_dataset_xf, transform_fn = (train_dataset
| tft.beam.AnalyzeAndTransformDataset(preprocessing_fn))
train_data_xf, metadata_xf = train_dataset_xf
data_xf_coder = tft.coders.ExampleProtoCoder(metadata_xf.schema)
_ = (train_xf_data
| 'EncodeTrainData' >> beam.Map(data_xf_coder.encode)
| 'WriteTrainData' >> beam.io.WriteToTFRecord(train_xf_path)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
eval_dataset = (eval_data, RAW_DATA_METADATA)
eval_dataset_xf = ((eval_dataset, transform_fn)
| tft.beam.TransformDataset())
eval_data_xf, _ = eval_dataset_xf
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
eval_dataset = (eval_data, RAW_DATA_METADATA)
eval_dataset_xf = ((eval_dataset, transform_fn)
| tft.beam.TransformDataset())
eval_data_xf, _ = eval_dataset_xf
_ = (eval_data_xf
| 'EncodeEvalData' >> beam.Map(data_xf_coder.encode)
| 'WriteEvalData' >> beam.io.WriteToTFRecord(eval_xf_path)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
_ = (transform_fn
| 'WriteTransformFn' >> tft.beam.WriteTransformFn(graph_dir))
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)

{'name_xf': 0}
{'name_xf': 2}
{'name_xf': 1}
{'name_xf': 0}
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{eval_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)

{'name_xf': -1}
{'name_xf': 0}
metadata_xf.schema

feature {
name: "name_xf"
type: INT
int_domain {
is_categorical: true
}
presence {
min_fraction: 1.0
}
shape {}
}
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables
saved_model.pb
transformed_metadata/
schema.pbtxt
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables
saved_model.pb
transformed_metadata/ alice
schema.pbtxt cathy
bob
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables/
saved_model.pb
transformed_metadata/
schema.pbtxt
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables/
saved_model.pb
transformed_metadata/
schema.pbtxt
tft_output = tft.TFTransformOutput(graph_dir)

@tf.function
def transform_raw_features(example):
return tft_output.transform_raw_features(example)

example = {"name": tf.constant(["Alice", "Bob", "Alice", "Suzy"])}


example_xf = transform_raw_features(example)
tft_output = tft.TFTransformOutput(graph_dir)

@tf.function
def transform_raw_features(example):
return tft_output.transform_raw_features(example)

example = {"name": tf.constant(["Alice", "Bob", "Alice", "Suzy"])}


example_xf = transform_raw_features(example)

{'name_xf': <tf.Tensor: [...] numpy=array([ 0, 2, 0, -1])>}


transform = Transform(
input_data=example_gen.outputs['examples'],
schema=infer_schema.outputs['output'],
module_file='my_transform.py')
context.run(transform)
transform.outputs

{
'transform_output': Channel(
type_name: TransformPath
artifacts: [Artifact([...])]),
'transformed_examples': Channel(
type_name: ExamplesPath
artifacts: [Artifact([...], split: train),
Artifact([...], split: eval)])
}
transform = Transform(
input_data=example_gen.outputs['examples'],
schema=infer_schema.outputs['output'],
module_file='my_transform.py')
context.run(transform)
my_transform.py

import tensorflow as tf
import tensorflow_transform as tft

def preprocessing_fn(inputs):
outputs = {}
outputs["name_xf"] = tft.compute_[...](inputs["name"])
[...]
return outputs
Lab 5

Preprocessing Data
with TF Transform (TFT)
136
Analyzing Model Results
Understanding more than just the top level metrics

● Users experience model performance for their queries only


● Poor performance on slices of data can be hidden by top level metrics
● Model fairness is important
● Often key subsets of users or data are very important, and may be small
○ Performance in critical but unusual conditions
○ Performance for key audiences such as influencers

137
138
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]

eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]

eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
tfma.view.render_slicing_metrics(
eval_result,
slicing_spec=slices[0])
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(
columns=['trip_start_day'],
features=[('trip_start_hour', 12)]]
eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
tfma.view.render_slicing_metrics(
eval_result,
slicing_spec=slices[0])
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(
columns=['trip_start_day', 'trip_start_hour'])]

eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
tfma.view.render_slicing_metrics(
eval_result,
slicing_spec=slices[0])
output_dirs = [os.path.join("output", run_name)
for run_name in ("run_0", "run_1", "run_2")]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)
output_dirs = [os.path.join("output", run_name)
for run_name in ("run_0", "run_1", "run_2")]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)

● MODEL_CENTRIC_MODE: main axis = model id


● DATA_CENTRIC_MODE: main axis = last data span
output_dirs = [os.path.join("output", run_name)
for run_name in ("run_0", "run_1", "run_2")]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)

tfma.view.render_time_series(
eval_results_from_disk,
slices[0])
Lab 6

TensorFlow Model Analysis (TFMA)


147
TensorFlow Serving

148
Application 1
Model v1

Application 2
Model v1

Application 3
Model v1
Application 1
Model v1

Application 2
Model v2

Application 3
Model v1
Application 1
Model v2

Application 2
Model v2

Application 3
Model v1
Application 1
Model v2

Application 2
Model v2

Application 3
Model v2
Application 1

TF Serving
Application 2
Model v1

Application 3
Application 1

TF Serving
Application 2
Model v2

Application 3
Application 1

TF Serving
Application 2
Model v2

Application 3
Fairness
Lab 7

Fairness
157
from tfx.types import ComponentSpec
from tfx.types.component_spec import ChannelParameter
from tfx.types.component_spec import ExecutionParameter
from tfx.types.standard_artifacts import Examples

class DataAugmentationComponentSpec(ComponentSpec):
PARAMETERS = {
'max_rotation_angle': ExecutionParameter(type=float)
}
INPUTS = {
'input_data': ChannelParameter(type=Examples)
}
OUTPUTS = {
'augmented_data': ChannelParameter(type=Examples)
}
from tfx.components.base.base_executor import BaseExecutor
from tfx.types.artifact_utils import get_split_uri

class DataAugmentationExecutor(BaseExecutor):
def Do(self, input_dict, output_dict, exec_properties):
input_examples_uri = get_split_uri(
input_dict['input_data'], 'train')
output_examples_uri = get_split_uri(
output_dict['augmented_data'], 'train')
max_rotation_angle = exec_properties['max_rotation_angle']
[...]
[...]
decoder = tfdv.TFExampleDecoder()
with beam.Pipeline() as pipeline:
_ = (pipeline
| 'ReadTrainData' >> beam.io.ReadFromTFRecord(input_examples_uri)
| 'ParseExample' >> beam.Map(decoder.decode)
| 'Augmentation' >> beam.ParDo(_augment_image, **exec_properties)
| 'DictToExample' >> beam.Map(_dict_to_example)
| 'SerializeExample' >> beam.Map(lambda x: x.SerializeToString())
| 'WriteAugmentedData' >> beam.io.WriteToTFRecord(
os.path.join(output_examples_uri, "data_tfrecord"),
file_name_suffix='.gz'))
[...]
from tfx.components.base.base_component import BaseComponent
from tfx.components.base.executor_spec import ExecutorClassSpec

class DataAugmentationComponent(BaseComponent):
SPEC_CLASS = DataAugmentationComponentSpec
EXECUTOR_SPEC = ExecutorClassSpec(DataAugmentationExecutor)

def __init__(self, input_data, max_rotation_angle=10.,


augmented_data=None, instance_name=None):
augmented_data = [...]
spec = DataAugmentationComponentSpec(
input_data=input_data,
max_rotation_angle=max_rotation_angle,
augmented_data=augmented_data)
super().__init__(spec=spec, instance_name=instance_name)
augmented_data = augmented_data or tfx.types.Channel(
type=Examples,
artifacts=[Examples(split="train"), Examples(split="eval")])
class MyCustomArtifact(tfx.types.artifact.Artifact):
TYPE_NAME = 'MyCustomArtifactPath'
Lab 8

Custom TFX Components


164
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer1',
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer2',
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer1',
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer2',
[...])

components = […, transform, trainer1, trainer2,…]


pipeline = Pipeline(components=components, …)
Lab 9

Alternate Pipeline Architectures


170
Neural Structured Learning
Training Neural Networks with Structured Signals

Arjun Gopalan
Software Engineer
171
How a Typical Neural Net Works

172
How a Typical Neural Net Works

Labeled data Neural Network


Input Label

Cat
Train

Lots of labeled Dog


examples
...

Dog

... ...

173
Neural Structured Learning (NSL)

Structure Neural Network

174
Neural Structured Learning (NSL)
Concept: train neural net using structure among samples

Labeled data Neural Network

Input Label

Few labeled Cat


Train
examples
Dog
+
Relations
?
between examples
?

...
175
Structure Among Samples

e.g., similar images

[Source: graph concept is from Juan et al., arXiv’19. Original images are from pixabay.com]

176
Structure Among Samples

Co-Occurrence Graph Citation Graph Text Graph

[Source: [Source: copied without modification from


[Source: graph concept is from Juan et al., WSDM’20. Original images are from
https://commons.wikimedia.org/wiki/File:Partial_citation_graph_for_%22A_screen_for_RNA-binding_ https://www.flickr.com/photos/marc_smith/6705382867/sizes/l/]
pixabay.com]
proteins_in_yeast_indicates_dual_functions_for_many_enzymes%22_as_of_April_12,_2017.png]

177
NSL: Advantages of Learning with
Structure
Less Labeled Data Required (Neural Graph Learning)

Robust Model (Adversarial Learning)

178
Scenario I: Not Enough Labeled Data
Example task:
Document Classification

Lots of samples

Not enough labels

179
NSL: Advantages of Learning with Structure
Less Labeled Data Required

Input Label

Computer Vision paper

NLP paper Use relations between examples


+ few labeled examples
?

...
180
NSL Resource: Tutorials
Scenario II: Model Robustness Required
Example task: Image Classification

[Source: Goodfellow, et al., ICLR’15]

182
NSL: Advantages of Learning with
Structure
Robust Model

Input Label
Original
image Panda
Use implicit structure
derived from
“adversarial” examples

Perturbed Panda
image

183
NSL Resource: Tutorials
NSL Framework
NSL: Neural Graph Learning

Graph + Neural Net

● Jointly optimizes both features


& structured signals for better
models

186
NSL: Neural Graph Learning

Graph + Neural Net

● Jointly optimizes both features


& structured signals for better
models

Neural Graph Machines (NGM)


Paper: Bui, Ravi & Ramavajjala [WSDM’18]
187
NSL: Neural Graph Learning
Joint optimization with label and
structured signals:
Optimize:

Example features
Supervised Loss Neighbor Loss

: NN output for input : Target hidden layer


: Loss function : Distance metric

Structured signals Examples: L2 (for regression) Examples: L1, L2, ...


Cross-Entropy (for classification)
NSL: Neural Graph Learning Training
Workflow

[Source: Juan, et al., WSDM’20]

189
NSL: Neural Graph Learning Training
Workflow

190
NSL: Neural Graph Learning Training
Workflow

191
NSL: Adversarial Learning

Adversarial + Neural Net

● Jointly optimize features from


original and “adversarial”
examples for more robust models xi

xj
x’i

x’j
Adversarial Learning
Paper: Goodfellow, et al. [ICLR’15]
192
Libraries
Tools
Trainers
Libraries, Tools, and Trainers

Red segments represent NSL additions


Libraries, Tools, and Trainers
Standalone Tool
build_graph
pack_nbrs

Graph Functions
build_graph
pack_nbrs
read_tsv_graph
write_tsv_graph
add_edge
add_undirected_edges
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs
read_tsv_graph
write_tsv_graph
add_edge
add_undirected_edges
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs Keras
read_tsv_graph graph_regularization
write_tsv_graph adversarial_regularization
add_edge Layers
add_undirected_edges
Estimator
add_graph_regularization
add_adversarial_regularization
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs Keras
read_tsv_graph graph_regularization
write_tsv_graph adversarial_regularization
add_edge Layers
add_undirected_edges
Estimator
add_graph_regularization
add_adversarial_regularization
Web: tensorflow.org/neural_structured_learning
pip install neural-structured-learning
import neural_structured_learning as nsl

# Extract features required for the model from the input.


train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.


base_model = tf.keras.Sequential(...) Keras Model
import neural_structured_learning as nsl

# Extract features required for the model from the input.


train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.


base_model = tf.keras.Sequential(...) Keras Model
# Wrap the base model with graph regularization.
graph_config = nsl.configs.GraphRegConfig( Config
neighbor_config=nsl.configs.GraphNeighborConfig(max_neighbors=1))
graph_model = nsl.keras.GraphRegularization(base_model, graph_config) Graph Model
import neural_structured_learning as nsl

# Extract features required for the model from the input.


train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.


base_model = tf.keras.Sequential(...) Keras Model
# Wrap the base model with graph regularization.
graph_config = nsl.configs.GraphRegConfig( Config
neighbor_config=nsl.configs.GraphNeighborConfig(max_neighbors=1))
graph_model = nsl.keras.GraphRegularization(base_model, graph_config) Graph Model

# Compile, train, and evaluate.


graph_model.compile(
optimizer='adam', Compile
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy']) Fit
graph_model.fit(train_dataset, epochs=5)
graph_model.evaluate(test_dataset) Eval
What If No Explicit Structure or Graph?

203
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

204
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

? Project documents into embeddings

embedding embedding
205
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

? Project documents into embeddings

See if similarity above the threshold

Similar?

embedding embedding
206
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

✔ Project documents into embeddings

See if similarity above the threshold


Add
If yes, add an edge making them neighbors

Similar

embedding embedding
207
“””Generate embeddings.”””
?

import neural_structured_learning as nsl


import tensorflow as tf
import tensorflow_hub as hub
embedding embedding

imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = ( Load Data
imdb.load_data(num_words=10000))

pretrained_embedding =
'https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1'
hub_layer = hub.KerasLayer(
pretrained_embedding, input_shape=[], dtype=tf.string, Pre-trained
trainable=True) Embedding
“””Generate embeddings.”””
?

import neural_structured_learning as nsl


import tensorflow as tf
import tensorflow_hub as hub
embedding embedding

imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = ( Load Data
imdb.load_data(num_words=10000))

pretrained_embedding =
'https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1'
hub_layer = hub.KerasLayer(
pretrained_embedding, input_shape=[], dtype=tf.string, Pre-trained
trainable=True) Embedding
# Generate embeddings.
record_id = int(0)
with tf.io.TFRecordWriter('/tmp/imdb/embeddings.tfr') as writer:
for word_vector in pp_train_data:
text = decode_review(word_vector) Text to
sentence_embedding = hub_layer(tf.reshape(text, shape=[-1,])) Embedding
sentence_embedding = tf.reshape(sentence_embedding, shape=[-1])
write_embedding_example(sentence_embedding, record_id) Lookup
record_id += 1

“””Build graph and prepare graph input for NSL.”””
Add

Similar

embedding embedding
# Build a graph from embeddings.
nsl.tools.build_graph((['/tmp/imdb/embeddings.tfr'],
'/tmp/imdb/graph_99.tsv', Graph Building
similarity_threshold=0.8)

# Create example features.


next_record_id = create_examples(pp_train_data, pp_train_labels,
'/tmp/imdb/train_data.tfr',
starting_record_id=0)
Feature Definition
create_examples(pp_test_data, pp_test_labels,
'/tmp/imdb/test_data.tfr',
starting_record_id=next_record_id)

“””Build graph and prepare graph input for NSL.”””
Add

Similar

embedding embedding
# Build a graph from embeddings.
nsl.tools.build_graph((['/tmp/imdb/embeddings.tfr'],
'/tmp/imdb/graph_99.tsv', Graph Building
similarity_threshold=0.8)

# Create example features.


next_record_id = create_examples(pp_train_data, pp_train_labels,
'/tmp/imdb/train_data.tfr',
starting_record_id=0)
Feature Definition
create_examples(pp_test_data, pp_test_labels,
'/tmp/imdb/test_data.tfr',
starting_record_id=next_record_id)

# Augment training data my merging neighbors into sample features.


nsl.tools.pack_nbrs('/tmp/imdb/train_data.tfr', '', Training Data
'/tmp/imdb/graph_99.tsv', Augmentation
'/tmp/imdb/nsl_train_data.tfr',
add_undirected_edges=True, max_nbrs=3)
“””Graph-regularized keras model.”””

# Extract features required for the model from the input.


train_ds, test_ds = make_datasets('/tmp/imdb/nsl_train.tfr', Read Data
'/tmp/imdb/test.tfr')

# Create a base model -- sequential, functional, or subclass.


base_model = tf.keras.Sequential(...)
Keras Model

# Wrap the base model with graph regularization.


graph_config = nsl.configs.GraphRegConfig( Config
neighbor_config=nsl.configs.GraphNeighborConfig(max_neighbors=2)) Graph Model
graph_model = nsl.keras.GraphRegularization(base_model, graph_config)

# Compile, train, and evaluate.


graph_model.compile(
optimizer='adam', Compile
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy']) Fit
graph_model.fit(train_dataset, epochs=5)
graph_model.evaluate(test_dataset)
Eval
Recap
Training with structured signals is useful!
Less labeled data required
Robust model

Neural Structured Learning provides:


APIs for building Keras and Estimator models

TF libraries, tools, and tutorials for learning with structured signals

Support for all kinds of neural nets: feedforward, convolutional, or recurrent

213
Thank You!
Web: tensorflow.org/neural_structured_learning
Repo: github.com/tensorflow/neural-structured-learning
Survey: cutt.ly/nsl2019
Struc
Structure Neural Network
Special acknowledgment: ture
Google Expander team

Arjun Gopalan
[email protected]
Up Next: Hands-on TFX+NSL Tutorial
IMDB Reviews
IMDB Reviews

POSITIVE ?
IMDB Reviews
Label = True

POSITIVE ?
ExampleGen

Examples (text + label)


ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)


ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator

blessing
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator

blessing

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform GraphAugmentation
examples (text_xf + label_xf) examples (augmented with neighbors)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform GraphAugmentation Trainer


examples (id + text_xf + label_xf) examples (augmented with neighbors)
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes


Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No


Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No

receiver_fn (TFMA) Yes Yes Yes No


Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No

receiver_fn (TFMA) Yes Yes Yes No

receiver_fn (TFMA) must return both the raw


features and the transformed features
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes Yes

serving_receiver_fn Yes Yes No Yes

receiver_fn (TFMA) Yes Yes Yes Yes

receiver_fn (TFMA) must return both the raw


features and the transformed features
Lab 10

NSL in TFX
233
Thank You!

You might also like