0% found this document useful (0 votes)

59 views234 pages

Production ML Pipelines With TensorFlow Extended - TFX - Presentation

Uploaded by

ku.madan05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views234 pages

Production ML Pipelines With TensorFlow Extended - TFX - Presentation

Uploaded by

ku.madan05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 234

TFX Tutorial

Developing Production ML Pipelines

Aurélien Géron
Consultant
@aureliengeron 1
What are we doing here?
What does it all mean?

2
In addition to training an amazing model ...

Modeling Code
… a production solution requires so much more

Data Veriﬁcation Monitoring

Conﬁguration

Data Collection
Modeling Analysis Tools
Code

Serving Infrastructure
Process Management Tools Machine Resource
Management
Feature Extraction
Production Machine Learning
“Hidden Technical Debt in Machine Learning Systems”
NIPS 2015
http://bit.ly/ml-techdebt
Production Machine Learning
Machine Learning Development

● Labeled data
● Feature space coverage
● Minimal dimensionality
● Maximum predictive data
● Fairness
● Rare conditions
● Data lifecycle management
Production Machine Learning
Machine Learning Development Modern Software Development

● Labeled data ● Scalability

+
● Feature space coverage ● Extensibility
● Minimal dimensionality ● Configuration
● Maximum predictive data ● Consistency & Reproducibility
● Fairness ● Modularity
● Rare conditions ● Best Practices
● Data lifecycle management ● Testability
● Monitoring
● Safety & Security
TensorFlow Extended (TFX)
TensorFlow Extended (TFX)
Powers Alphabet’s most important bets and products
… and some of Google’s most important partners.

“... we have re-tooled our machine learning platform to

use TensorFlow. This yielded significant productivity
gains while positioning ourselves to take advantage of
the latest industry research.”
Ranking Tweets with TensorFlow - Twitter blog post
When to use TFX
We’re learning how to create an ML pipeline using TFX
○ TFX pipelines are appropriate when:
■ datasets are large, or may someday get large
■ training/serving consistency is important
■ version management for inference is important

○ Google uses TFX pipelines for everything from single-node to large-scale

ML training and inference

11
What we’re doing
We’re following a typical ML development process
● Understanding our data
● Feature engineering
● Training
● Analyze model performance
● Lather, rinse, repeat
● Ready to deploy to production
12
Data ingestion
Data ingestion Data validation

age is missing

country not in:

● China
● India
● USA
Data ingestion Data validation Data transform

age normalized_age

China ➜ [1, 0, 0]
India ➜ [0, 1, 0]
USA ➜ [0, 0, 1]
Data ingestion Data validation Data transform

Model training
Data ingestion Data validation Data transform

Model training Model analysis

?
Data ingestion Data validation Data transform

Model training Model analysis

Accuracy

Model
version
Data ingestion Data validation Data transform

Model training Model analysis

Accuracy

new

Model
version
Data ingestion Data validation Data transform

Model training Model analysis Model push

Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push

Requests
Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push

Predictions
Model serving
Data ingestion Data validation Data transform

Model training Model analysis Model push

Data ingestion Data validation Data transform

Model training Model analysis Model push

TF Data Validation (TFDV)

Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

TF Model Analysis (TFMA)

Model serving
TF Data Validation (TFDV) TF Transform (TFT)

Data ingestion Data validation Data transform

Model training Model analysis Model push

TF Model Analysis (TFMA)

Model serving

TF Serving (TFS)
TFDV TFT TFMA TFS
(or , or...)

TFDV TFT TFMA TFS

(or , or...)

TFDV TFT TFMA TFS

ML Metadata Store (MLMD)

(or , or...) SQL

(or , or...)

TFDV TFT TFMA TFS

ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
(or , or...)

TFDV TFT TFMA TFS

TFX ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
(or , or...)

TFDV TFT TFMA TFS

TFX ML Metadata Store (MLMD)

Data &
models
(or , or...) SQL
GoogleCloud
or or Dataflow or...
Orchestration: ...

Metadata Store: ...

Processing API:

Google Cloud
Beam Runner: Dataflow ...
Orchestration: Manual w/ InteractiveContext

Metadata Store:

Processing API:

Beam Runner:
Local Runner
Exercise 1
Prerequisites
● Linux / MacOS
● Python 3.6
● Virtualenv
● Git

42
Step 1: Setup your environment
% sudo apt-get update

% sudo apt-get install -y python3-dev python3-pip virtualenv

% sudo apt-get install -y build-essential libssl-dev libffi-dev

% sudo apt-get install -y libxml2-dev libxslt1-dev zlib1g-dev

% sudo apt-get install -y git-core

43
Step 1: Setup your environment
% cd

% virtualenv -p python3.6 tfx_env

% source tfx_env/bin/activate

(tfx_env) mkdir tfx; cd tfx

(tfx_env) pip install tensorflow==2.0.0

(tfx_env) pip install tfx==0.14.0

44
TFX End-to-End Example

Predicting Online News Popularity

45
TFX End-to-End Example

Online News Popularity Dataset

Features

Categorical Features Date Features Text Features Numerical Features

data_channel (Lifestyle, Tech…) publication_date slug (e.g., snow-dogs) n_unique_tokens

weekday (Monday, Tuesday…) n_hrefs

n_imgs

global_subjectivity

kw_avg_max (best kw avg shares)

self_reference_avg_shares

global_sentiment_polarity
Label = n_shares
...
46
Parses Transforms Expects Label

train_input_fn No No Yes
Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes
Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No

Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No

receiver_fn (TFMA) Yes Yes Yes

Parses Transforms Expects Label

train_input_fn No No Yes

eval_input_fn No No Yes

serving_receiver_fn Yes Yes No

receiver_fn (TFMA) Yes Yes Yes

receiver_fn (TFMA) must return both the raw

features and the transformed features
Lab 1

Running a simple TFX pipeline

manually in a Colab Notebook
52
ML Coding vs ML Engineering

53
ML Coding vs ML engineering

Machine Resource
Data Veriﬁcation
Management
Data Collection

Serving
Conﬁguration Monitoring
Infrastructure
ML Code Analysis Tools

Feature Extraction Process Management Tools

Adapted from: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
Writing Software (Programming)

Programming in the small (Coding) Programming in the large (Engineering)

Monolithic code Modular design and implementation

Non-reusable code Libraries for reuse (ideally across languages)

Undocumented code Well documented contracts and abstractions

Untested code Well tested code (exhaustively and at scale)

Unbenchmarked or hack-optimized once code Continuously benchmarked and optimized code

Unveriﬁed code Reviewed and peer veriﬁed code

Undebuggable code or adhoc tooling Debuggable code and debug tooling

Uninstrumented code Instrumentable and instrumented code

... ...
Writing ML Software (The “Code” view)

ML Programming in the small (Coding) ML Programming in the large (Engineering)

Monolithic code Modular design and implementation

Non-reusable code Libraries for reuse (ideally across languages)

Undocumented code Well documented contracts and abstractions

Untested code Well tested code (exhaustively and at scale)

Unbenchmarked or hack-optimized once code Continuously benchmarked and optimized code

Unveriﬁed code Reviewed and peer veriﬁed code

Undebuggable code or adhoc tooling Debuggable code and debug tooling

Uninstrumented code Instrumentable and instrumented code

... ...
This slide is, not surprisingly, the same as the previous one however it is only half the story :)
Engineering

// Strong Contracts.
Output Program(Inputs) {
... Human authored and peer reviewed code ...
}

// Exhaustive testing of Contracts and Performance.

TestProgramCommonCase1...N {
...
}

TestProgramEdgeCase1...N() {
EXPECT_EQ(..., Program(...))
}

BenchmarkProgramWorstCase1...N {
...
}
Engineering vs ML Engineering

// Subjective representations and unclear objectives

LearnedProgram Learning(ProblemStatement, Data) {
... Human authored and peer reviewed ML pipelines ...
}

// Strong Contracts. // Unclear Contracts

Output Program(Inputs) { Output LearnedProgram(Inputs) {
... Human authored and peer reviewed code ... ... Opaque “Program” (aka Model) ...
} }

// Exhaustive testing of Contracts and Performance. // Peer Reviewing of ProblemStatement

TestProgramCommonCase1...N {
... // Data Validation
} Expectations for Data on
● Shape, Invariants, Distribution(s), ...
TestProgramEdgeCase1...N() {
EXPECT_EQ(..., Program(...)) // Model Validation
} Expectations for LearnedProgram “average” behavior on:
● “Metrics” {Quality, Fairness, Perf, ...}
BenchmarkProgramWorstCase1...N { Cross Product
... ● “Data Slices” {Global, UserChosen, AutoChosen,
} ...}
Writing ML Software (The “Data and other Artifacts” view)

ML Programming in the small (Coding) ML Programming in the large (Engineering)

Monolithic code Fixed Datasets Evolving Datasets (Data, Features, ...) and Objectives

Non-reusable code Unmergeable Artifacts Reusable Models aka Modules, Mergeable Statistics ...

Undocumented code No Problem Statements Problem Statements, Discoverable Artifacts

Untested code Non-validated Datasets, Models Expectations, Data Validation, Model Validation ...

Unbenchmarked or hack-optimized once code Models Quality and Performance Benchmarked Models ...

Unveriﬁed code Biased Datasets / Artifacts {Data, Model} x {Understanding, Fairness}

Undebuggable code or adhoc tooling Visualizations, Summarizations, Understanding ...

Uninstrumented code Full Artifact Lineage

... ...
This is the remaining half!
Introduction to Apache Beam

60
What is Apache Beam?

- A unified batch and stream distributed processing API

- A set of SDK frontends: Java, Python, Go, Scala, SQL
- A set of Runners which can execute Beam jobs into various
backends: Local, Apache Flink, Apache Spark, Apache
Gearpump, Apache Samza, Apache Hadoop, Google Cloud
Dataflow, …
Apache Beam Cloud Dataﬂow

Java Apache Flink

input.apply(
Sum.integersPerKey()) Apache Spark

Python
Apache Apex
input | Sum.PerKey()

Sum Per Key Gearpump

Go
stats.Sum(s, input)
IBM Streams

SQL
Apache Samza
SELECT key, SUM(value)
FROM input GROUP BY key
Apache Nemo
(incubating)

⋮
Beam Portability Framework
● Currently most runners support the Java SDK only
● Portability framework (https://beam.apache.org/roadmap/portability/) aims to
provide full interoperability across the Beam ecosystem
● Portability API
○ Protobufs and gRPC for broad language support
○ Job submission and management: The Runner API
○ Job execution: The SDK harness
● Python Flink and Spark runners use Portability Framework
Beam Portability Support Matrix
Hello World Example
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
Hello World Example
with beam.Pipeline() as pipeline:

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

Concepts
● Pipeline
● PCollection
● PTransform
● I/O transforms
Pipeline
● A Pipeline encapsulates your entire data processing
task
● This includes reading input data, transforming that
data, and writing output data.
● All Beam driver programs must create a Pipeline.
● You can specify the execution options when
creating the Pipeline to tell it where and how to run.
Pipeline
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
PCollection
● A distributed dataset your Beam pipeline operates on.
● The dataset can be bounded (from fixed source) or
unbounded (from a continuously updating source).
● The pipeline typically creates a source PCollection by
reading data from an external data source
○ But you can also create a PCollection from in-memory data within your driver program.

● From there, PCollections are the inputs and outputs

for each step in your pipeline.
PCollection
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run()

result.state
PTransform
● A PTransform represents a data processing operation,
or a step, in your pipeline.
● Every PTransform takes one or more PCollection
objects as input
● It performs a processing function that you provide on
the elements of that PCollection.
● It produces zero or more output PCollection objects.
PTransform
pipeline = beam.Pipeline()

lines = (pipeline

| "Create" >> beam.Create(["Hello", "World", "!!!"])

| "Print" >> beam.ParDo(print))

result = pipeline.run() output

Hello
result.state
World
!!!
I/O Transforms
● Beam comes with a number of “IOs” library
PTransforms.
● They read or write data to various external storage
systems.
I/O Transforms
with beam.Pipeline() as pipeline:
lines = (pipeline
| beam.io.ReadFromTFRecord("test_in.tfrecord")
| beam.Map(lambda line: line + b' processed')
| beam.io.WriteToTFRecord("test_out.tfrecord"))
Lab 2

Introduction to Apache Beam

76
TFX’s Beam Orchestrator

77
Orchestration:
Orchestrator

Metadata Store:

Processing API:

Beam Runner:
Local Runner
Exercise 3
from tfx.orchestration import pipeline

pipeline.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
components=[
example_gen, statistics_gen, infer_schema, validate_stats,
transform, trainer, model_analyzer, model_validator, pusher
],
enable_cache=True,
metadata_connection_config=sqlite_metadata_connection_config(
metadata_path),
additional_pipeline_args={},
)
Lab 3

On-Prem with Beam Orchestrator

81
TensorFlow Data Validation

82
Data Exploration & Cleanup
The ﬁrst task in any data science or ML project is to understand and clean the data

● Understand the data types for each feature

● Look for anomalies and missing values
● Understand the distributions for each feature

83
import tensorflow_data_validation as tfdv

train_stats = tfdv.generate_statistics_from_csv(
data_location=_train_data_filepath)

tfdv.visualize_statistics(train_stats)
85
tfdv.visualize_statistics(
lhs_statistics=eval_stats,
rhs_statistics=train_stats,
lhs_name='EVAL_DATASET',
rhs_name='TRAIN_DATASET')
schema = tfdv.infer_schema(statistics=train_stats)
tfdv.display_schema(schema=schema)
anomalies = tfdv.validate_statistics(
statistics=eval_stats,
schema=schema)

tfdv.display_anomalies(anomalies)
# Relax the minimum fraction of values that must come from
# the domain for feature company.
company = tfdv.get_feature(schema, 'company')
company.distribution_constraints.min_domain_mass = 0.9

# Add new value to the domain of feature payment_type.

payment_type = tfdv.get_domain(schema, 'payment_type')
payment_type.value.append('bitcoin')
# All features are by default in both TRAINING
# and SERVING environments.
schema.default_environment.append('TRAINING')
schema.default_environment.append('SERVING')

# Specify that 'tips' feature is not in SERVING

# environment.
n_shares_feature = tfdv.get_feature(schema, 'n_shares')
n_shares_feature.not_in_environment.append('SERVING')

serving_anomalies_with_env = tfdv.validate_statistics(
serving_stats, schema, environment='SERVING')
# Add skew comparator for 'weekday' feature.
weekday = tfdv.get_feature(schema, 'weekday')
weekday.skew_comparator.infinity_norm.threshold = 0.01

# Add drift comparator for 'title_subjectivity' feature.

title_subjectivity = tfdv.get_feature(schema, 'title_subjectivity')
title_subjectivity.drift_comparator.infinity_norm.threshold = 0.001

skew_anomalies = tfdv.validate_statistics(
train_stats,
schema,
previous_statistics=eval_stats,
serving_statistics=serving_stats)
Lab 4

TensorFlow Data Validation (TFDV)

93
TensorFlow Transform

94
Data Preprocessing
The raw data usually needs to be prepared before being fed to a Machine
Learning model. This may involve several transformations:

● Fill in missing values ● Feature crosses

● Normalize features ● Vocabularies
● Bucketize features ● Embeddings
● Zoom/Crop images ● PCA
● Augment images ● Categorical encoding

95
Training/Serving Skew
● Preprocessing data before training
● Same preprocessing required at serving time
● Possibly with multiple serving environments
● Risk of discrepancy
In-model preprocessing
● If we include the preprocessing steps in the TensorFlow graph, the
problem is solved
● Except training is slow
○ Preprocessing runs once per epoch instead of just once
98
Training Serving

99
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))
RAW_DATA_FEATURE_SPEC = {
"name": tf.io.FixedLenFeature([], tf.string)
}

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))

{
'_schema': feature {
name: "name"
type: BYTES
presence {
min_fraction: 1.0
}
shape {}
}
}
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'

decoded = data_coder.decode(encoded)
data_coder = tft.coders.ExampleProtoCoder(
RAW_DATA_METADATA.schema)
encoded = data_coder.encode({"name": "café"})

b'\n\x13\n\x11\n\x04name\x12\t\n\x07\n\x05caf\xc3\xa9'

decoded = data_coder.decode(encoded)

{'name': b'caf\xc3\xa9'}
tmp_dir = tempfile.mkdtemp(prefix="tft-data")
train_path = os.path.join(tmp_dir, "train.tfrecord")

with beam.Pipeline() as pipeline:

_ = (pipeline
| "Create" >> beam.Create(["Alice", "Bob", "Cathy", "Alice"])
| "ToDict" >> beam.Map(lambda name: {"name": name})
| "Encode" >> beam.Map(data_coder.encode)
| "Write" >> beam.io.WriteToTFRecord(train_path)
)
tmp_dir = tempfile.mkdtemp(prefix="tft-data")
train_path = os.path.join(tmp_dir, "train.tfrecord")

with beam.Pipeline() as pipeline:

/tmp/tft-datac1z2ichz/train.tfrecord-00000-of-00001
eval_path = os.path.join(tmp_dir, "eval.tfrecord")

with beam.Pipeline() as pipeline:

_ = (pipeline
| "Create" >> beam.Create(["Denis", "Alice"])
| "ToDict" >> beam.Map(lambda name: {"name": name})
| "Encode" >> beam.Map(data_coder.encode)
| "Write" >> beam.io.WriteToTFRecord(eval_path)
)

/tmp/tft-datac1z2ichz/eval.tfrecord-00000-of-00001
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)

{'name': b'Alice'}
{'name': b'Bob'}
{'name': b'Cathy'}
{'name': b'Alice'}
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "Decode" >> beam.Map(data_coder.decode)
| "Print" >> beam.Map(print)
)

{'name': b'Denis'}
{'name': b'Alice'}
def preprocessing_fn(inputs):
outputs = {}
lower = tf.strings.lower(inputs["name"])
outputs["name_xf"] = tft.compute_and_apply_vocabulary(lower)
return outputs
https://www.tensorflow.org/tfx/transform/api_docs ➔ Math
◆ covariance()
➔ Buckets ◆ max()
◆ apply_buckets() ◆ mean()
◆ apply_buckets_with_interpolation() ◆ min()
◆ bucketize() ◆ pca()
◆ bucketize_per_key() ◆ quantiles()
◆ scale_by_min_max()
➔ Text & Categories
◆ scale_by_min_max_per_key()
◆ apply_vocabulary()
◆ scale_to_0_1()
◆ bag_of_words()
◆ scale_to_0_1_per_key()
◆ compute_and_apply_vocabulary()
◆ scale_to_z_score()
◆ hash_strings()
◆ scale_to_z_score_per_key()
◆ ngrams()
◆ size()
◆ vocabulary()
◆ sum()
◆ word_count()
◆ var()
◆ tfidf()
➔ Misc
➔ Apply arbitrary transformations
◆ deduplicate_tensor_per_row()
◆ apply_function_with_checkpoint()
◆ get_analyze_input_columns()
◆ apply_pyfunc()
◆ get_transform_input_columns()
◆ apply_saved_model()
◆ segment_indices()
◆ ptransform_analyzer()
◆ sparse_tensor_to_dense_with_shape()
with beam.Pipeline() as pipeline:

train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)
with beam.Pipeline() as pipeline:

train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)

train_dataset = (train_data, RAW_DATA_METADATA)

train_dataset_xf, transform_fn = (train_dataset
| tft.beam.AnalyzeAndTransformDataset(preprocessing_fn))
train_data_xf, metadata_xf = train_dataset_xf
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
train_data = (pipeline
| "ReadTrain" >> beam.io.ReadFromTFRecord(f"{train_path}*")
| "DecodeTrain" >> beam.Map(data_coder.decode)
)

train_dataset = (train_data, RAW_DATA_METADATA)

train_dataset_xf, transform_fn = (train_dataset
| tft.beam.AnalyzeAndTransformDataset(preprocessing_fn))
train_data_xf, metadata_xf = train_dataset_xf
data_xf_coder = tft.coders.ExampleProtoCoder(metadata_xf.schema)
_ = (train_xf_data
| 'EncodeTrainData' >> beam.Map(data_xf_coder.encode)
| 'WriteTrainData' >> beam.io.WriteToTFRecord(train_xf_path)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
eval_dataset = (eval_data, RAW_DATA_METADATA)
eval_dataset_xf = ((eval_dataset, transform_fn)
| tft.beam.TransformDataset())
eval_data_xf, _ = eval_dataset_xf
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
eval_data = (pipeline
| "ReadEval" >> beam.io.ReadFromTFRecord(f"{eval_path}*")
| "DecodeEval" >> beam.Map(data_coder.decode)
)
eval_dataset = (eval_data, RAW_DATA_METADATA)
eval_dataset_xf = ((eval_dataset, transform_fn)
| tft.beam.TransformDataset())
eval_data_xf, _ = eval_dataset_xf
_ = (eval_data_xf
| 'EncodeEvalData' >> beam.Map(data_xf_coder.encode)
| 'WriteEvalData' >> beam.io.WriteToTFRecord(eval_xf_path)
)
with beam.Pipeline() as pipeline:
with tft.beam.Context(temp_dir=tmp_dir):
[...]
_ = (transform_fn
| 'WriteTransformFn' >> tft.beam.WriteTransformFn(graph_dir))
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{train_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)

{'name_xf': 0}
{'name_xf': 2}
{'name_xf': 1}
{'name_xf': 0}
with beam.Pipeline() as pipeline:
_ = (pipeline
| "Read" >> beam.io.ReadFromTFRecord(f"{eval_xf_path}*")
| "Decode" >> beam.Map(data_xf_coder.decode)
| "Print" >> beam.ParDo(print)
)

{'name_xf': -1}
{'name_xf': 0}
metadata_xf.schema

feature {
name: "name_xf"
type: INT
int_domain {
is_categorical: true
}
presence {
min_fraction: 1.0
}
shape {}
}
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables
saved_model.pb
transformed_metadata/
schema.pbtxt
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables
saved_model.pb
transformed_metadata/ alice
schema.pbtxt cathy
bob
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables/
saved_model.pb
transformed_metadata/
schema.pbtxt
/tmp/tft-data0o6lwwt0/graph/
transform_fn/
assets/
vocab_compute_and_apply_vocabulary_vocabulary
variables/
saved_model.pb
transformed_metadata/
schema.pbtxt
tft_output = tft.TFTransformOutput(graph_dir)

@tf.function
def transform_raw_features(example):
return tft_output.transform_raw_features(example)

example = {"name": tf.constant(["Alice", "Bob", "Alice", "Suzy"])}

example_xf = transform_raw_features(example)
tft_output = tft.TFTransformOutput(graph_dir)

@tf.function
def transform_raw_features(example):
return tft_output.transform_raw_features(example)

example = {"name": tf.constant(["Alice", "Bob", "Alice", "Suzy"])}

example_xf = transform_raw_features(example)

{'name_xf': <tf.Tensor: [...] numpy=array([ 0, 2, 0, -1])>}

transform = Transform(
input_data=example_gen.outputs['examples'],
schema=infer_schema.outputs['output'],
module_file='my_transform.py')
context.run(transform)
transform.outputs

{
'transform_output': Channel(
type_name: TransformPath
artifacts: [Artifact([...])]),
'transformed_examples': Channel(
type_name: ExamplesPath
artifacts: [Artifact([...], split: train),
Artifact([...], split: eval)])
}
transform = Transform(
input_data=example_gen.outputs['examples'],
schema=infer_schema.outputs['output'],
module_file='my_transform.py')
context.run(transform)
my_transform.py

import tensorflow as tf
import tensorflow_transform as tft

def preprocessing_fn(inputs):
outputs = {}
outputs["name_xf"] = tft.compute_[...](inputs["name"])
[...]
return outputs
Lab 5

Preprocessing Data
with TF Transform (TFT)
136
Analyzing Model Results
Understanding more than just the top level metrics

● Users experience model performance for their queries only

● Poor performance on slices of data can be hidden by top level metrics
● Model fairness is important
● Often key subsets of users or data are very important, and may be small
○ Performance in critical but unusual conditions
○ Performance for key audiences such as inﬂuencers

137
138
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]

eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(columns=['trip_start_hour']),
tfma.slicer.SingleSliceSpec(columns=['trip_start_day'])]

eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
tfma.view.render_slicing_metrics(
eval_result,
slicing_spec=slices[0])
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(
columns=['trip_start_day'],
features=[('trip_start_hour', 12)]]
eval_result = tfma.run_model_analysis(
eval_shared_model=eval_model,
data_location='data.tfrecord',
file_format='tfrecords',
slice_spec=slices,
output_path='output/run0')
tfma.view.render_slicing_metrics(
eval_result,
slicing_spec=slices[0])
eval_model = tfma.default_eval_shared_model(
eval_saved_model_path='eval/run0/eval_model/0')
slices = [tfma.slicer.SingleSliceSpec(
columns=['trip_start_day', 'trip_start_hour'])]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)
output_dirs = [os.path.join("output", run_name)
for run_name in ("run_0", "run_1", "run_2")]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)

● MODEL_CENTRIC_MODE: main axis = model id

● DATA_CENTRIC_MODE: main axis = last data span
output_dirs = [os.path.join("output", run_name)
for run_name in ("run_0", "run_1", "run_2")]

eval_results_from_disk = tfma.load_eval_results(
output_dirs, tfma.constants.MODEL_CENTRIC_MODE)

tfma.view.render_time_series(
eval_results_from_disk,
slices[0])
Lab 6

TensorFlow Model Analysis (TFMA)

147
TensorFlow Serving

148
Application 1
Model v1

Application 2
Model v1

Application 3
Model v1
Application 1
Model v1

Application 2
Model v2

Application 3
Model v1
Application 1
Model v2

Application 2
Model v2

Application 3
Model v1
Application 1
Model v2

Application 2
Model v2

Application 3
Model v2
Application 1

TF Serving
Application 2
Model v1

Application 3
Application 1

TF Serving
Application 2
Model v2

Application 3
Application 1

TF Serving
Application 2
Model v2

Application 3
Fairness
Lab 7

Fairness
157
from tfx.types import ComponentSpec
from tfx.types.component_spec import ChannelParameter
from tfx.types.component_spec import ExecutionParameter
from tfx.types.standard_artifacts import Examples

class DataAugmentationComponentSpec(ComponentSpec):
PARAMETERS = {
'max_rotation_angle': ExecutionParameter(type=float)
}
INPUTS = {
'input_data': ChannelParameter(type=Examples)
}
OUTPUTS = {
'augmented_data': ChannelParameter(type=Examples)
}
from tfx.components.base.base_executor import BaseExecutor
from tfx.types.artifact_utils import get_split_uri

class DataAugmentationExecutor(BaseExecutor):
def Do(self, input_dict, output_dict, exec_properties):
input_examples_uri = get_split_uri(
input_dict['input_data'], 'train')
output_examples_uri = get_split_uri(
output_dict['augmented_data'], 'train')
max_rotation_angle = exec_properties['max_rotation_angle']
[...]
[...]
decoder = tfdv.TFExampleDecoder()
with beam.Pipeline() as pipeline:
_ = (pipeline
| 'ReadTrainData' >> beam.io.ReadFromTFRecord(input_examples_uri)
| 'ParseExample' >> beam.Map(decoder.decode)
| 'Augmentation' >> beam.ParDo(_augment_image, **exec_properties)
| 'DictToExample' >> beam.Map(_dict_to_example)
| 'SerializeExample' >> beam.Map(lambda x: x.SerializeToString())
| 'WriteAugmentedData' >> beam.io.WriteToTFRecord(
os.path.join(output_examples_uri, "data_tfrecord"),
file_name_suffix='.gz'))
[...]
from tfx.components.base.base_component import BaseComponent
from tfx.components.base.executor_spec import ExecutorClassSpec

class DataAugmentationComponent(BaseComponent):
SPEC_CLASS = DataAugmentationComponentSpec
EXECUTOR_SPEC = ExecutorClassSpec(DataAugmentationExecutor)

def init(self, input_data, max_rotation_angle=10.,

augmented_data=None, instance_name=None):
augmented_data = [...]
spec = DataAugmentationComponentSpec(
input_data=input_data,
max_rotation_angle=max_rotation_angle,
augmented_data=augmented_data)
super().__init__(spec=spec, instance_name=instance_name)
augmented_data = augmented_data or tfx.types.Channel(
type=Examples,
artifacts=[Examples(split="train"), Examples(split="eval")])
class MyCustomArtifact(tfx.types.artifact.Artifact):
TYPE_NAME = 'MyCustomArtifactPath'
Lab 8

Custom TFX Components

164
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer1',
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer2',
[...])
transform = Transform(...)

trainer1 = Trainer(
trainer_fn='trainer.trainer_fn1',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer1',
[...])

trainer2 = Trainer(
trainer_fn='trainer.trainer_fn2',
transformed_examples=transform.outputs.transformed_examples,
instance_name='Trainer2',
[...])

components = […, transform, trainer1, trainer2,…]

pipeline = Pipeline(components=components, …)
Lab 9

Alternate Pipeline Architectures

170
Neural Structured Learning
Training Neural Networks with Structured Signals

Arjun Gopalan
Software Engineer
171
How a Typical Neural Net Works

172
How a Typical Neural Net Works

Labeled data Neural Network

Input Label

Cat
Train

Lots of labeled Dog

examples
...

Dog

... ...

173
Neural Structured Learning (NSL)

Structure Neural Network

174
Neural Structured Learning (NSL)
Concept: train neural net using structure among samples

Labeled data Neural Network

Input Label

Few labeled Cat

Train
examples
Dog
+
Relations
?
between examples
?

...
175
Structure Among Samples

e.g., similar images

[Source: graph concept is from Juan et al., arXiv’19. Original images are from pixabay.com]

176
Structure Among Samples

Co-Occurrence Graph Citation Graph Text Graph

[Source: [Source: copied without modiﬁcation from

[Source: graph concept is from Juan et al., WSDM’20. Original images are from
https://commons.wikimedia.org/wiki/File:Partial_citation_graph_for_%22A_screen_for_RNA-binding_ https://www.ﬂickr.com/photos/marc_smith/6705382867/sizes/l/]
pixabay.com]
proteins_in_yeast_indicates_dual_functions_for_many_enzymes%22_as_of_April_12,_2017.png]

177
NSL: Advantages of Learning with
Structure
Less Labeled Data Required (Neural Graph Learning)

Robust Model (Adversarial Learning)

178
Scenario I: Not Enough Labeled Data
Example task:
Document Classification

Lots of samples

Not enough labels

179
NSL: Advantages of Learning with Structure
Less Labeled Data Required

Input Label

Computer Vision paper

NLP paper Use relations between examples

+ few labeled examples
?

...
180
NSL Resource: Tutorials
Scenario II: Model Robustness Required
Example task: Image Classification

[Source: Goodfellow, et al., ICLR’15]

182
NSL: Advantages of Learning with
Structure
Robust Model

Input Label
Original
image Panda
Use implicit structure
derived from
“adversarial” examples

Perturbed Panda
image

183
NSL Resource: Tutorials
NSL Framework
NSL: Neural Graph Learning

Graph + Neural Net

● Jointly optimizes both features

& structured signals for better
models

186
NSL: Neural Graph Learning

Graph + Neural Net

● Jointly optimizes both features

& structured signals for better
models

Neural Graph Machines (NGM)

Paper: Bui, Ravi & Ramavajjala [WSDM’18]
187
NSL: Neural Graph Learning
Joint optimization with label and
structured signals:
Optimize:

Example features
Supervised Loss Neighbor Loss

: NN output for input : Target hidden layer

: Loss function : Distance metric

Structured signals Examples: L2 (for regression) Examples: L1, L2, ...

Cross-Entropy (for classification)
NSL: Neural Graph Learning Training
Workflow

[Source: Juan, et al., WSDM’20]

189
NSL: Neural Graph Learning Training
Workflow

190
NSL: Neural Graph Learning Training
Workflow

191
NSL: Adversarial Learning

Adversarial + Neural Net

● Jointly optimize features from

original and “adversarial”
examples for more robust models xi

xj
x’i

x’j
Adversarial Learning
Paper: Goodfellow, et al. [ICLR’15]
192
Libraries
Tools
Trainers
Libraries, Tools, and Trainers

Red segments represent NSL additions

Libraries, Tools, and Trainers
Standalone Tool
build_graph
pack_nbrs

Graph Functions
build_graph
pack_nbrs
read_tsv_graph
write_tsv_graph
add_edge
add_undirected_edges
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs
read_tsv_graph
write_tsv_graph
add_edge
add_undirected_edges
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs Keras
read_tsv_graph graph_regularization
write_tsv_graph adversarial_regularization
add_edge Layers
add_undirected_edges
Estimator
add_graph_regularization
add_adversarial_regularization
Libraries, Tools, and Trainers
Standalone Tool Lib
build_graph unpack_neighbor_features
pack_nbrs adversarial_neighbor
replicate_embeddings
Graph Functions
utils
build_graph
pack_nbrs Keras
read_tsv_graph graph_regularization
write_tsv_graph adversarial_regularization
add_edge Layers
add_undirected_edges
Estimator
add_graph_regularization
add_adversarial_regularization
Web: tensorflow.org/neural_structured_learning
pip install neural-structured-learning
import neural_structured_learning as nsl

# Extract features required for the model from the input.

train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.

base_model = tf.keras.Sequential(...) Keras Model
import neural_structured_learning as nsl

# Extract features required for the model from the input.

train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.

base_model = tf.keras.Sequential(...) Keras Model
# Wrap the base model with graph regularization.
graph_config = nsl.configs.GraphRegConfig( Config
neighbor_config=nsl.configs.GraphNeighborConfig(max_neighbors=1))
graph_model = nsl.keras.GraphRegularization(base_model, graph_config) Graph Model
import neural_structured_learning as nsl

# Extract features required for the model from the input.

train_dataset, test_dataset = make_datasets('/tmp/train.tfr', Read Data
'/tmp/test.tfr')

# Create a base model -- sequential, functional, or subclass.

# Compile, train, and evaluate.

graph_model.compile(
optimizer='adam', Compile
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy']) Fit
graph_model.fit(train_dataset, epochs=5)
graph_model.evaluate(test_dataset) Eval
What If No Explicit Structure or Graph?

203
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

204
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

? Project documents into embeddings

embedding embedding
205
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

? Project documents into embeddings

See if similarity above the threshold

Similar?

embedding embedding
206
What If No Explicit Structure or Graph?
Construct graph via Preprocessing

Find neighbors using embeddings

✔ Project documents into embeddings

See if similarity above the threshold

Add
If yes, add an edge making them neighbors

Similar

embedding embedding
207
“””Generate embeddings.”””
?

import neural_structured_learning as nsl

import tensorflow as tf
import tensorflow_hub as hub
embedding embedding

imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = ( Load Data
imdb.load_data(num_words=10000))

import neural_structured_learning as nsl

import tensorflow as tf
import tensorflow_hub as hub
embedding embedding

imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = ( Load Data
imdb.load_data(num_words=10000))

pretrained_embedding =
'https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1'
hub_layer = hub.KerasLayer(
pretrained_embedding, input_shape=[], dtype=tf.string, Pre-trained
trainable=True) Embedding
# Generate embeddings.
record_id = int(0)
with tf.io.TFRecordWriter('/tmp/imdb/embeddings.tfr') as writer:
for word_vector in pp_train_data:
text = decode_review(word_vector) Text to
sentence_embedding = hub_layer(tf.reshape(text, shape=[-1,])) Embedding
sentence_embedding = tf.reshape(sentence_embedding, shape=[-1])
write_embedding_example(sentence_embedding, record_id) Lookup
record_id += 1
✔
“””Build graph and prepare graph input for NSL.”””
Add

Similar

embedding embedding
# Build a graph from embeddings.
nsl.tools.build_graph((['/tmp/imdb/embeddings.tfr'],
'/tmp/imdb/graph_99.tsv', Graph Building
similarity_threshold=0.8)

# Create example features.

next_record_id = create_examples(pp_train_data, pp_train_labels,
'/tmp/imdb/train_data.tfr',
starting_record_id=0)
Feature Definition
create_examples(pp_test_data, pp_test_labels,
'/tmp/imdb/test_data.tfr',
starting_record_id=next_record_id)
✔
“””Build graph and prepare graph input for NSL.”””
Add

Similar

embedding embedding
# Build a graph from embeddings.
nsl.tools.build_graph((['/tmp/imdb/embeddings.tfr'],
'/tmp/imdb/graph_99.tsv', Graph Building
similarity_threshold=0.8)

# Create example features.

# Augment training data my merging neighbors into sample features.

nsl.tools.pack_nbrs('/tmp/imdb/train_data.tfr', '', Training Data
'/tmp/imdb/graph_99.tsv', Augmentation
'/tmp/imdb/nsl_train_data.tfr',
add_undirected_edges=True, max_nbrs=3)
“””Graph-regularized keras model.”””

# Extract features required for the model from the input.

train_ds, test_ds = make_datasets('/tmp/imdb/nsl_train.tfr', Read Data
'/tmp/imdb/test.tfr')

# Create a base model -- sequential, functional, or subclass.

base_model = tf.keras.Sequential(...)
Keras Model

# Wrap the base model with graph regularization.

graph_config = nsl.configs.GraphRegConfig( Config
neighbor_config=nsl.configs.GraphNeighborConfig(max_neighbors=2)) Graph Model
graph_model = nsl.keras.GraphRegularization(base_model, graph_config)

# Compile, train, and evaluate.

Neural Structured Learning provides:

APIs for building Keras and Estimator models

TF libraries, tools, and tutorials for learning with structured signals

Support for all kinds of neural nets: feedforward, convolutional, or recurrent

213
Thank You!
Web: tensorflow.org/neural_structured_learning
Repo: github.com/tensorflow/neural-structured-learning
Survey: cutt.ly/nsl2019
Struc
Structure Neural Network
Special acknowledgment: ture
Google Expander team

Arjun Gopalan
[email protected]
Up Next: Hands-on TFX+NSL Tutorial
IMDB Reviews
IMDB Reviews

POSITIVE ?
IMDB Reviews
Label = True

POSITIVE ?
ExampleGen

Examples (text + label)

ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator

blessing
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator

blessing

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen

schema

ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform
examples (text_xf + label_xf)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform GraphAugmentation
examples (text_xf + label_xf) examples (augmented with neighbors)
ExampleGen

Examples (text + label)

IdentifyExamples

Examples (text + label + id)

StatisticsGen

statistics

SchemaGen TF Hub
1. text embeddings
schema 2. embeddings synthesized_graph
NSL
ExampleValidator SynthesizeGraph

blessing synthesized_graph

Transform GraphAugmentation Trainer

examples (id + text_xf + label_xf) examples (augmented with neighbors)
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No

Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No

receiver_fn (TFMA) Yes Yes Yes No

Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes No

serving_receiver_fn Yes Yes No No

receiver_fn (TFMA) Yes Yes Yes No

receiver_fn (TFMA) must return both the raw

features and the transformed features
Parses Transforms Expects Label Expects augmented

train_input_fn No No Yes Yes

eval_input_fn No No Yes Yes

serving_receiver_fn Yes Yes No Yes

receiver_fn (TFMA) Yes Yes Yes Yes

receiver_fn (TFMA) must return both the raw

features and the transformed features
Lab 10

NSL in TFX
233
Thank You!

Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
No ratings yet
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
436 pages
100 Days of ML
No ratings yet
100 Days of ML
383 pages
AWS MLOps Slides
No ratings yet
AWS MLOps Slides
185 pages
Unit Ii
No ratings yet
Unit Ii
83 pages
10 From Zero To ML
No ratings yet
10 From Zero To ML
53 pages
Eitca Exam Mat Short Eitc-Ai-Tff
No ratings yet
Eitca Exam Mat Short Eitc-Ai-Tff
88 pages
AML Lecture1.3
No ratings yet
AML Lecture1.3
72 pages
ML in Production en
No ratings yet
ML in Production en
106 pages
ML Pipelines AI Community
No ratings yet
ML Pipelines AI Community
53 pages
MLOps Specialization Course
No ratings yet
MLOps Specialization Course
29 pages
C2 - W3 Mlopssasaddsad
No ratings yet
C2 - W3 Mlopssasaddsad
65 pages
MLOps Getting From Good To Great
No ratings yet
MLOps Getting From Good To Great
41 pages
Segmentation Dataset
No ratings yet
Segmentation Dataset
41 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
No ratings yet
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
9 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
TensorFlow Extended Part 2 - Model Build - Analysis - and - Serving
No ratings yet
TensorFlow Extended Part 2 - Model Build - Analysis - and - Serving
47 pages
MLOps Research Work by Arka Roy
No ratings yet
MLOps Research Work by Arka Roy
21 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
01 Intro
No ratings yet
01 Intro
49 pages
AWS ML Notes - Domain Misc
No ratings yet
AWS ML Notes - Domain Misc
15 pages
MLOps Specialization Course January 2024
No ratings yet
MLOps Specialization Course January 2024
24 pages
CT1-MLOPs S1 2
No ratings yet
CT1-MLOPs S1 2
68 pages
PPT
No ratings yet
PPT
10 pages
cs329s 2022 02 Slides MLSD
No ratings yet
cs329s 2022 02 Slides MLSD
99 pages
AI Engineer Interview Prep Guide
No ratings yet
AI Engineer Interview Prep Guide
16 pages
Unit 1
No ratings yet
Unit 1
21 pages
Mooc Progress Report
No ratings yet
Mooc Progress Report
8 pages
Tantithamthavorn Et Al - 2025
No ratings yet
Tantithamthavorn Et Al - 2025
7 pages
Towards ML Engineering A Brief History To TFX
No ratings yet
Towards ML Engineering A Brief History To TFX
16 pages
MLOps
No ratings yet
MLOps
16 pages
Notesv 1
No ratings yet
Notesv 1
6 pages
Sony Ai Content
No ratings yet
Sony Ai Content
26 pages
Unit 4
No ratings yet
Unit 4
28 pages
Enabling Automated Machine Learning For Model-Driven AI Engineering
No ratings yet
Enabling Automated Machine Learning For Model-Driven AI Engineering
5 pages
IT Book Sinhala
50% (2)
IT Book Sinhala
7 pages
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
100% (1)
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
14 pages
E7fASQoeT6S3wEkKHi k2Q - Machine Learning in The Enterprise - Course Summary
No ratings yet
E7fASQoeT6S3wEkKHi k2Q - Machine Learning in The Enterprise - Course Summary
17 pages
Preet Hi
No ratings yet
Preet Hi
75 pages
WD (UNIT-2) PPT
No ratings yet
WD (UNIT-2) PPT
170 pages
2020-09-17 - Lak - GDG - Machine Learning Design Patterns For MLOps PDF
No ratings yet
2020-09-17 - Lak - GDG - Machine Learning Design Patterns For MLOps PDF
43 pages
MLops Concept
No ratings yet
MLops Concept
20 pages
Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
Week 2 - Select and Train A Model
No ratings yet
Week 2 - Select and Train A Model
29 pages
Machine Learning and Deep Learning Using Tensor Flow Course
No ratings yet
Machine Learning and Deep Learning Using Tensor Flow Course
8 pages
Deploy Machine Learning Models
100% (1)
Deploy Machine Learning Models
45 pages
Lecture 8 - Lifecycle of A Data Science Project - Part 2
No ratings yet
Lecture 8 - Lifecycle of A Data Science Project - Part 2
43 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
0jhKAy5cS6K4SgMuXHuiyg - TensorFlow On Google Cloud - Course Summary
No ratings yet
0jhKAy5cS6K4SgMuXHuiyg - TensorFlow On Google Cloud - Course Summary
7 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Generative AI Tghjraining in Hyderabad
No ratings yet
Generative AI Tghjraining in Hyderabad
22 pages
MLOps Asilla 20221124
No ratings yet
MLOps Asilla 20221124
16 pages
Deep Atlas MLI Syllabus
No ratings yet
Deep Atlas MLI Syllabus
1 page
Web Design Notes For BCA 5th Sem 2019 PDF
No ratings yet
Web Design Notes For BCA 5th Sem 2019 PDF
44 pages
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
Architecting To Support Machine Learning
No ratings yet
Architecting To Support Machine Learning
47 pages
MLflow - An Open Platform To Simplify The Machine Learning Lifecycle Presentation 1
No ratings yet
MLflow - An Open Platform To Simplify The Machine Learning Lifecycle Presentation 1
28 pages
JNTUA-B.Tech.4-1 CSE-R15-SYLLABUS
No ratings yet
JNTUA-B.Tech.4-1 CSE-R15-SYLLABUS
16 pages
University of Palestine: Computer Graphics
No ratings yet
University of Palestine: Computer Graphics
21 pages
DS Lab Manual
No ratings yet
DS Lab Manual
58 pages
Smart Board Manual
No ratings yet
Smart Board Manual
2 pages
Cid 2 Code
No ratings yet
Cid 2 Code
389 pages
Com - Gk.speed - Booster.tool Logcat
No ratings yet
Com - Gk.speed - Booster.tool Logcat
52 pages
CAPS and IBASS Manuals
No ratings yet
CAPS and IBASS Manuals
50 pages
【Android Development Advanced Series】Android Multi-Process Topic
No ratings yet
【Android Development Advanced Series】Android Multi-Process Topic
30 pages
Rohini 52093006178
No ratings yet
Rohini 52093006178
11 pages
PDF 24
No ratings yet
PDF 24
5 pages
Unit-2 Installation and Configuration of Android
No ratings yet
Unit-2 Installation and Configuration of Android
24 pages
Brocade SMI Agent 120.9.0 Release Notes v2.0: September 8, 2009
No ratings yet
Brocade SMI Agent 120.9.0 Release Notes v2.0: September 8, 2009
23 pages
Afzaal Shamraiz: Personal Information
No ratings yet
Afzaal Shamraiz: Personal Information
2 pages
Use Case Specification Template On Uber
No ratings yet
Use Case Specification Template On Uber
14 pages
SVMCM Manual Non Net Applicant
No ratings yet
SVMCM Manual Non Net Applicant
12 pages
Wa0005.
No ratings yet
Wa0005.
12 pages
Computer Training
No ratings yet
Computer Training
19 pages
Therac-25 Public Testing
No ratings yet
Therac-25 Public Testing
9 pages
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
No ratings yet
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
40 pages
Discord 101 For Creators 1 2
No ratings yet
Discord 101 For Creators 1 2
1 page
4 Grading: Dia Marie R. Lalican
No ratings yet
4 Grading: Dia Marie R. Lalican
9 pages
RCA 5-Why's Template
No ratings yet
RCA 5-Why's Template
2 pages
Hfss 3d Component Model User Guide
No ratings yet
Hfss 3d Component Model User Guide
11 pages
+BATCH4 HW1 ISEM500 v1
No ratings yet
+BATCH4 HW1 ISEM500 v1
6 pages
System Compatibility Report
No ratings yet
System Compatibility Report
4 pages
Daniel Pyld Resume
No ratings yet
Daniel Pyld Resume
1 page
Switch Configuration 2013
No ratings yet
Switch Configuration 2013
2 pages
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
Big Data Visualization
From Everand
Big Data Visualization
James D. Miller
No ratings yet