0% found this document useful (0 votes)
4 views6 pages

Tech Talks 10154

Uploaded by

krutik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Tech Talks 10154

Uploaded by

krutik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

Hello.

My name is Steve, and I’m an engineer at Apple.


Hi.
I’m Paul.
I’m also an engineer.
In this video, we are going to walk you through a deep dive into one of the new
aspects of Core ML, converting PyTorch models to Core ML.
At WWDC 2020, we announced an overhaul to Core ML converters that improved many
aspects of the conversion process.
We’ve expanded support for the libraries most commonly used by the deep learning
community.
We’ve redesigned the converter architecture to improve user experience, leveraging
a new in-memory representation.
And we’ve unified the API so there’s a single call to invoke conversion from any
model source.
If you haven’t already seen it, I definitely recommend you check out the video
that goes into the details of this new converter architecture.
But in this video, I’m going to focus on model conversion, starting with a model
built in the PyTorch deep learning framework.
So maybe you’re an ML engineer who’s been hard at work training a model using
PyTorch.
Or maybe you’re an app developer who’s found a killer PyTorch model online and now
you want to drop that model into your app.
Now the question is how do you convert that PyTorch model into a Core ML model?
Well, the old Core ML converter required you to export your model to ONNX as a step
in the process.
And if you’ve used that converter, you might have run into some of its
limitations.
ONNX is an open standard, and so it can be slow to evolve and introduce new
features.
Compounding that, ML frameworks like PyTorch need time to add support for
exporting their latest model features to ONNX.
So with the old converter, you might have found yourself with a PyTorch model that
you couldn’t export to ONNX, blocking its conversion to Core ML.
Well, removing this extra dependency is just one of the things that’s changed in
the new Core ML converter.
So in this video, we’ll dig into the details of the brand-new PyTorch model
conversion path.
We’ll walk through the different ways of converting a PyTorch model into Core ML,
including some real-world conversion examples.
And finally, I’ll share some helpful tips for you to follow if you run into
trouble along the way.
So now let’s dive into the new conversion process.
Starting with the PyTorch model you want to convert, you’ll use PyTorch’s JIT
module to convert to a representation called TorchScript.
If you’re curious, JIT is an acronym that stands for Just In Time.
Then with a TorchScript model in hand, you’ll invoke the new Core ML converter to
generate an ML model which you can drop into your app.
Later in the video, I’ll dig into what that TorchScript conversion process looks
like.
But now let’s look at how the new Core ML converter works.
The converter is written in Python, and invoking it only takes a couple lines of
code.
You simply provide it with a model, which can either be a TorchScript object or
the path to one saved on disk, and a description of the inputs to the model.
You can also include some information about the outputs of the model, but that’s
optional.
The converter works by iterating over the operations in the TorchScript graph and
converting them one by one to their Core ML equivalent.
Sometimes one TorchScript operation might get converted into multiple Core ML
operations.
Other times, the graph optimization pass might detect a known pattern and fuse
several operations into one.
Now, sometimes a model might include a custom operation that the converter doesn’t
understand.
But that’s okay.
The converter is designed to be extensible, so it’s easy to add definitions for
new operations.
In many cases, you can express that operation as a combination of existing ones,
which we call a "composite op.
" But if that isn’t sufficient, you can also write a custom Swift implementation
and target that during conversion.
I won’t get into the details of how to do that in this video, but please check out
our online resources for examples and walk-throughs.
Now that I’ve given an overview of the whole conversion process, it’s time to
circle back and dig into how to get a TorchScript model from your PyTorch model.
There are two ways PyTorch can do this.
The first is called "tracing," and the second is called "scripting.
" Let’s first look at what it means to trace a model.
Tracing is done by invoking the trace method of PyTorch’s JIT module, as shown in
this code snippet.
We pass in a PyTorch model along with an example input, and it returns the model
and TorchScript representation.
So what does this call actually do? The active tracing runs an example input
through a forward pass of the model and captures the operations that are invoked as
the input makes its way through the model’s layers.
The collection of all those operations then becomes the TorchScript representation
of the model.
Now when you’re picking an example input to trace with, the best thing to use is
data similar to what the model will see during normal use.
For instance, you could use one sample of validation data or data captured the
same way your app will present it to the model.
You could also use random data.
If you do, make sure that the range of the input values and the shape of the
tensor is consistent with what the model expects.
Let’s make all of this a little more concrete by working through an example.
I’d like to introduce my colleague Paul, who will take you through the full
process of converting a segmentation model from PyTorch to Core ML.
Thanks, Steve.
Suppose I have a segmentation model, and I would like it to run on-device.
If you aren’t familiar with what a segmentation model does, it takes an image and
assigns a class probability score to each pixel of that image.
So how would I get my model running on-device? I’m going to convert my model into
a Core ML model.
To do this, I first trace my PyTorch model to turn it into TorchScript form using
PyTorch’s JIT tracing module.
Then I use the new Core ML converter to convert the TorchScript model into a Core
ML model.
Finally, I will show off how the resulting Core ML model integrates seamlessly
into Xcode.
Let’s see what this process looks like in code.
In this Jupyter Notebook, I will convert my PyTorch segmentation model, mentioned
in the slides, into a Core ML model.
If you’d like to try this code for yourself, it is available in the code snippet
associated with this video.
First, I import some dependencies that I will use for this demo.
Next, I load in the ResNet-101 segmentation model from torchvision and a sample
input: in this case, an image of a dog and cat.
PyTorch models take in tensor objects, not PIL Image objects.
So I convert the image to a tensor with transforms.
ToTensor.
The model also expects an extra dimension in the tensor denoting the batch size,
so I add that in as well.
As mentioned in the slides, the Core ML converter accepts a TorchScript model.
To obtain this, I use the Torch.
JIT module’s trace method, which converts a PyTorch model to a TorchScript model.
Uh-oh.
Tracing has thrown an exception.
As it says in the exception method, "Only tensors or tuples of tensors can be
output from traced functions.
" This is a limitation of PyTorch’s JIT module.
The problem here is that my model is returning a dictionary.
I solve this by wrapping my model in a PyTorch module that extracts only the
tensor value from the output dictionary.
Here I declare my class wrapper that inherits from PyTorch’s module class.
I define a model attribute which contains ResNet-101, as used above.
In the forward method of this wrapping class, I index the returned dictionary with
the key named "out" and return just the tensor output.
Now that the model returns a tensor and not a dictionary, it will successfully
trace.
It is now time for me to utilize the new Core ML converter.
First, I need to define my input and its preprocessing.
I define my input as an ImageType with preprocessing that normalizes the image with
ImageNet statistics and scales its values down to lie between 0 and 1.
This preprocessing is what ResNet-101 expects.
Next, I simply call the Core ML tools convert method, passing in the TorchScript
model and the input definition.
After conversion, I’ll set the metadata of my model so it can be understood by
other programs such as Xcode.
I set the type of my model to segmentation and enumerate the classes in my model’s
order.
So, does my converted model work? I can easily visualize my model’s output through
Xcode.
First, I’ll save my model.
Now all I need to do is click on my saved model in Finder, and it will be opened by
Xcode.
Here I can view its metadata, including input shapes and types.
To visualize the model’s output, I’ll go to the Preview tab and drag in my sample
image of a dog and cat.
Looks like my model is successfully segmenting the pets in this image.
ResNet-101 was able to be traced, but some models cannot just be traced.
To explain how to get these other models to convert, I’ll kick it back to Steve.
Thanks, Paul.
Okay.
I think we have a pretty good handle on how conversion works using tracing.
But PyTorch offers a second way to get TorchScript.
So now let’s dig into that one, which is called "scripting.
" Scripting works by taking a PyTorch model and directly compiling into TorchScript
operations.
Remember, tracing captured the model as data flowed through it.
But like tracing, scripting a model is also really easy.
Simply invoke the script method of PyTorch’s JIT module and provide it with a
model.
Okay.
I’ve shown you two different ways to get a TorchScript representation, and you
might be wondering when to use one versus the other.
One case where you must use scripting is if the model includes control flow.
Let’s look at an example to understand why.
Here, this model has branches and loops, and scripting will capture all of it
because it is directly compiling the model.
If we traced the model, what we get is only the path through the model for the
given input, which you can see doesn’t capture the whole model.
If you do need to script a model, you’ll usually get the best results if you trace
as much of the model as possible and script only the parts of the model that need
it.
This is because tracing usually produces a simpler representation than scripting
does.
Let’s see how to apply this idea by looking at some code.
In this example, I’ve got a model that runs some chunk of code a fixed number of
times inside a loop.
I’ve isolated the body of the loop into something that can easily be traced on its
own, and then I can apply scripting to the model as a whole.
What we’re basically doing is limiting the scripting to just the bits of control
flow that need it and then tracing everything else.
This mixing of tracing and scripting works because they both will skip over code
that’s already been converted to TorchScript.
Now it’s time to go through a concrete example that uses scripting.
I’ll hand it back over to Paul, who will walk you through converting a language
model.
Hey there.
Suppose I have a sentence completion model that I want to convert into a Core ML
model so it can run on-device.
For some context, sentence completion is a task that involves taking a sentence
fragment and using a model to predict the words that are likely to come after it.
So what does this look like in terms of computation steps? I’ll start with a few
words of a sentence fragment and pass them through what’s called an encoder that
translates those words into a representation my model can understand.
In this case, a sequence of integer tokens.
Next, I’ll pass that sequence of tokens into my model, which will predict the next
token in the sequence.
I will continue feeding my model the partially constructed sentence, appending new
tokens to the end until my model predicts a special end-of-sentence token, which
means my sentence is completed.
Now that I have a complete sentence of tokens, I’ll pass it through a decoder,
which converts the tokens back into words.
The middle part of this diagram, completing the list of tokens, is what I will
convert into a Core ML model.
The encoder and decoder are handled separately.
Let’s make sure we understand what’s going on by looking at some pseudo-code.
The core of my model is my next token predictor.
For this, I will use Hugging Face’s GPT2 model.
The predictor takes a list of tokens as inputs and gives me a prediction for the
next token.
Next, I’ll wrap some control flow around the predictor to continue until I see the
end-of-sentence token.
Inside the loop, I append the predicted token to the running list and use that as
the input to my predictor on every loop.
When my predictor returns the end-of-sentence token, I’ll return the complete
sentence for decoding.
Now to see this whole process encode, let’s dive into the Jupyter Notebook.
In this notebook, I’ll construct a language model that takes a sentence fragment
and completes the sentence.
Let’s get the imports out of the way.
Here is the code for my model.
My model inherits from torch.
Module and contains attributes for the end-of-sentence token, the
next_token_predictor model and the default token denoting the beginning of a
sentence.
In its forward method, just like in the slides, I’ve written a loop body that
takes a list of tokens and predicts the next one.
The loop continues until the end-of-sentence token is generated.
When this happens, we will return the sentence.
As mentioned, my next token predictor will be GPT2, which will reside in the loop
body.
I’m going to follow the practice of tracing the loop body separate from scripting
the whole model.
So I’ll run the JIT tracer on only my next token predictor.
It takes a list of tokens as inputs, so for tracing, I’ll just pass in a list of
random tokens.
I can see that the tracer emitted a warning telling me this trace might not
generalize to other inputs.
Note that this warning is from the PyTorch’s JIT tracer and not Core ML.
It will be explained what’s going on here in the troubleshooting section later,
but for now I’ll ignore this warning since there isn’t actually a problem.
With the bulk of the loop body traced, I can instantiate my sentence finishing
model and apply the JIT scripter to prepare it for conversion to Core ML.
Now with my TorchScript model, I convert it to a Core ML model just like in the
segmentation demo.
Now I’ll see if my model can finish a sentence.
I create a sentence fragment: in this case, "The Manhattan Bridge is.
" Then I run it through GPT2’s included encoder to get the fragment’s encoding, and
then convert that list of tokens into a Torch tensor.
Next, I package up the input from my Core ML model, run said model and decode the
outputs with GPT2’s included decoder.
Nice.
The Core ML model was able to complete the sentence.
Looks like it generated a statement about the Manhattan Bridge.
You might run into bumps along the road as you trace and script your models to get
them into a Core ML format.
I’ll hand it back to Steve to help you along the way.
Before we wrap up, I want to review the snags we hit when converting PyTorch models
to Core ML and go over some troubleshooting tips and best practices.
Thinking back to the segmentation demo, remember we encountered an error during
tracing.
This was because our model returned a dictionary and JIT tracing can only handle
tensors or tuples of tensors.
The solution we showed in the demo was to create a thin wrapper around the model
that unpacks the model’s native outputs.
Remember, in this example, the model returned a dictionary, so here we’re
accessing the dictionary key that represents the inference result and returning
that tensor.
Of course, this idea also works if we want to access and return multiple items
from the dictionary or if we need to unpack other types of containers.
Now during the language model demo, we encountered a tracer warning that said the
trace might not generalize to other inputs.
And we see the tracer helpfully print the troublesome line of code.
So what’s actually going on? If we look at the model source code to understand the
warning, we see that the model is slicing one tensor based on the size of another
tensor.
Getting the size of a tensor results in a bare Python value-- in other words, not
a PyTorch tensor-- and the tracer is warning that it can’t trace the math
operations being performed on these bare Python values.
However, in this case the tracer is a little too aggressive in emitting this
warning, and there isn’t actually a problem.
A good rule of thumb when it comes to tracing code that operates on bare Python
values is that only built-in Python operations will be captured correctly by the
tracer.
Here are a few examples to help explain this idea.
Let’s think through these and figure out, based on that rule of thumb, if they
will be traced correctly or not.
The first example is very similar to what we saw during the demo and will result
in a correct trace since a built-in operation, in this case addition, is being
applied.
The second example also will trace correctly, in this case using the modulo
operator, which again is a built-in operation.
But the third example won’t trace correctly.
The JIT tracer doesn’t know what the library function math.
sqrt does, and the traced graph will have a constant value recorded instead of the
operations to compute the tensor size and square root.
But with a simple fix to the model to replace math.
sqrt with Python’s built-in power operator, this will result in a correct trace.
Now let’s look at a case where scripting a model can fail.
This model starts with an empty list and successively appends a fixed set of
integers to it.
Keep in mind this isn’t a terribly useful model.
I’m just using it to illustrate a failure condition.
If I script this model, I’ll get a runtime error that hints at a type mismatch.
The JIT scripter needs type information to turn a model into TorchScript and does
a pretty good job inferring object types from context.
However, there are times when that’s not possible, and if the scripter can’t
figure out an object’s type, it assumes the object is a tensor.
In this case, it’s assuming this list is a list of tensors while it’s actually
being built as a list of integers.
So what can I do to help the scripter out? Well, I can either include meaningful
initialization of the variable or I can use type annotations.
Here, I’ve adjusted the model to show examples of both.
There’s one last thing I want to mention.
You always want to make sure your model is in evaluation mode before tracing.
This ensures that all the layers are configured for inference rather than
training.
For most layers, this doesn’t matter.
But, for example, if you have a dropout layer in your model, setting evaluation
mode will make sure it’s disabled.
And when the converter encounters operations that have been disabled, it will
treat them as pass-through operations.
We’ve covered a lot of material in this video, but you can find even more
information in the links associated with the video, including the Core ML converter
documentation, information about custom op conversion and many detailed TorchScript
examples.
We’re really excited to provide first-class support for converting PyTorch models.
I hope you’ll find that the new Core ML converter will enable broader support for
your PyTorch models, empower you to have optimized on-device model execution and
really provide you with maximum support to get your model converted easily.
Thanks for watching.

You might also like