0% found this document useful (0 votes)

30 views

Talk Pytorch

The document provides an overview of PyTorch internals including tensors, the just-in-time compiler, and production tips. Tensors in PyTorch are multi-dimensional matrices containing elements of a single data type. All PyTorch computation is implemented in C++ with Python bindings. The ATen library provides foundational tensor operations and Autograd enables automatic differentiation.

Uploaded by

dung

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Talk Pytorch

Uploaded by

dung

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood

A guide to understand PyTorch internals

Christian S. Perone
([email protected])
http://blog.christianperone.com

PyData Montreal, Feb 2019

PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

Agenda
TENSORS
Tensors
Python objects
Zero-copy
Tensor storage
Memory allocators (CPU/GPU)
The big picture
JIT
Just-in-time compiler
Tracing
Scripting
Why TorchScript ?
Building IR and JIT Phases
Optimizations
Serialization
Using models in other languages
PRODUCTION
Some tips
Q&A
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

WHO AM I

É Christian S. Perone
É 14 years working with Machine
Learning, Data Science and Software
Engineering in industry R&D
É Blog at
É blog.christianperone.com
É Open-source projects at
É https://github.com/perone
É Twitter @tarantulae

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DISCLAIMER
É PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DISCLAIMER
É PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
É This is not a talk to teach you the basics of PyTorch or how to
train your network, but to teach you how PyTorch
components works under the hood in a intuitive way;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Section I

[ TENSORS \

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
>>> t.shape # a shape
torch.Size([2, 2])

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS
É Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
É In Python, the integration of C++ code is (usually) done using
what is called an extension;
É PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
É To do automatic differentiation, PyTorch uses Autograd, which
is an augmentation on top of the ATen framework;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct { typedef struct _object {

PyObject_HEAD Py_ssize_t ob_refcnt;
double ob_fval; struct _typeobject *ob_type;
} PyFloatObject; } PyObject;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct { typedef struct _object {

PyObject_HEAD Py_ssize_t ob_refcnt;
double ob_fval; struct _typeobject *ob_type;
} PyFloatObject; } PyObject;

PyFloatObject object PyObject object

PyObject_HEAD Py_ssize_t ob_refcnt

double ob_fval struct _typeobject *ob_type

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};

The TH prefix is from TorcH, and P means Python. PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};

Ref Count = 1
variable_a
THPVariable object

PyObject_HEAD (w/ ref counter)

(object fields)
variable_b
Ref Count = 2

The TH prefix is from TorcH, and P means Python. PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

IN PYTHON, EVERYTHING IS AN OBJECT

>>> a = 300
>>> b = 300
>>> a is b
False

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

IN PYTHON, EVERYTHING IS AN OBJECT

>>> a = 300
>>> b = 300
>>> a is b
False
>>> a = 200
>>> b = 200
>>> a is b
True

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

IN PYTHON, EVERYTHING IS AN OBJECT

PyIntObject object
>>> a = 300 a
Ref Count = 1
PyObject_HEAD
>>> b = 300 (object fields)

>>> a is b PyIntObject object

Ref Count = 1
False b PyObject_HEAD
(object fields)

>>> a = 200
Ref Count = 1
>>> b = 200 a
PyIntObject object
>>> a is b PyObject_HEAD
True (object fields)
b
Ref Count = 2

A typical Python program spend much of its time

allocating/deallocating integers. CPython then caches the small
integers.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])

Underline after an operation means an in-place operation. PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)

Underline after an operation means an in-place operation. PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
É Now imagine that you have a batch of 128 images, 3 channels
each (RGB) and with size of 224x224;

Column
Channel
1 1
1 1
0 1 1
1 1 0 1 0
0 1 1 0
Row 1 0 1 1 00 1 1
1 1 1 0
0 0 0 1 01 0 0
0 0
1 1 10 0 0
1 1
0 1

É This will yield a size in memory of ∼ 74MB. We don’t want to

duplicate memory (except when copying them to discrete GPUs
of course);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
>>> np_array
array([[2., 2.],
[2., 2.]])

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS
at::Tensor tensor_from_numpy(PyObject* obj) {
// (...) - omitted for brevity
auto array = (PyArrayObject*)obj;
int ndim = PyArray_NDIM(array);
auto sizes = to_aten_shape(ndim, PyArray_DIMS(array));
auto strides = to_aten_shape(ndim, PyArray_STRIDES(array));
// (...) - omitted for brevity
void* data_ptr = PyArray_DATA(array);
auto& type = CPU(dtype_to_aten(PyArray_TYPE(array)));
Py_INCREF(obj);
return type.tensorFromBlob(data_ptr, sizes, strides,
[obj](void* data) {
AutoGIL gil;
Py_DECREF(obj);
});
}

Pay attention to the reference counting using Py_INCREF() and the

call to tensorFromBlob() function. PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

DATA POINTERS

FloatTensor object PyArrayObject object

data_pointer* data_pointer*
(object fields) (object fields)

The tensor FloatTensor did a copy of the numpy array data

pointer and not of the contents. The reference is kept safe by the
Python reference counting mechanism.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
struct C10_API StorageImpl final : (...) {
// (...)
private:
// (...)
caffe2::TypeMeta data_type_;
DataPtr data_ptr_;
int64_t numel_;
Allocator* allocator_;
}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

É Holds a pointer to the raw data and contains information such as

the size and allocator;
É Storage is a dumb abstraction, there is no metadata telling us
how to interpret the data it holds;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE
É The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE
É The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
É We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
>>> tensor_a = torch.ones((2, 2))
>>> tensor_b = tensor_a.view(4)
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

É tensor_b is a different view (interpretation) of the same data

present in the underlying storage that is shared between both
tensors.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

MEMORY ALLOCATORS (CPU/GPU)

É The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

MEMORY ALLOCATORS (CPU/GPU)

É The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:
struct Allocator {
virtual ~Allocator() {}
virtual DataPtr allocate(size_t n) const = 0;
virtual DeleterFnPtr raw_deleter() const {...}
void* raw_allocate(size_t n) {...}
void raw_deallocate(void* ptr) {...}
};
É There are Allocator s that will use the GPU allocators such as
cudaMallocHost() when the storage should be used for the
GPU or posix_memalign() POSIX functions for data in the
CPU memory.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

THE BIG PICTURE

Tensor object Storage object
Raw Data
Storage *storage DataPtr data_ptr
(object fields) Allocator *allocator
(object fields) Allocator object

raw_allocate()
raw_deallocate()
(object fields)

É The Tensor has a Storage which in turn has a pointer to

the raw data and to the Allocator to allocate memory
according to the destination device.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Section II

[ JIT \

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

É PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

É PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;

É However, this poses problems for optimization and for

decoupling it from Python (the model itself is Python code);

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

É PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;

É However, this poses problems for optimization and for

decoupling it from Python (the model itself is Python code);

É PyTorch 1.0 introduced torch.jit , which has two main

methods to convert a PyTorch model to a serializable and
optimizable format;

É TorchScript was also introduced as a statically-typed subset of

Python;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

Two very different worlds with their own requirements.

EAGER MODE SCRIPT MODE

tracing
Prototype, debug, train, Optimization, other
experiment
" languages, deployment

# scripting !

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
>>> ftrace.graph
graph(%x : Float(2, 2)) {
%4 : Float() = prim::Constant[value={2}]()
%5 : Device = prim::Constant[value="cpu"]()
%6 : int = prim::Constant[value=6]()
%7 : bool = prim::Constant[value=0]()
%8 : bool = prim::Constant[value=0]()
%9 : Float() = aten::to(%4, %5, %6, %7, %8)
%10 : Float() = aten::detach(%9)
return (%10); }
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

TRACING
To call the JIT’ed function, just call the forward() method:

>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING
To call the JIT’ed function, just call the forward() method:

>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)

However, tracing will not record any control-flow like if statements

or loops, it executes the code with the given context and creates the
graph. You can see this limitation below:

>>> x = torch.ones(2, 2).add_(1.0)

>>> ftrace.forward(x)
tensor(2.)
According to my_function() , result should have been 1.0. Tracing
also checks for differences between traced and Python function, but
what about Dropout ? PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

SCRIPTING
Another alternative is to use scripting, where you can use decorators
such as @torch.jit.script :

@torch.jit.script
def my_function(x):
if bool(x.mean() > 1.0):
r = 1
else:
r = 2
return r

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING
>>> my_function.graph
graph(%x : Tensor) {
%2 : float = prim::Constant[value=1]()
%5 : int = prim::Constant[value=1]()
%6 : int = prim::Constant[value=2]()
%1 : Tensor = aten::mean(%x)
%3 : Tensor = aten::gt(%1, %2)
%4 : bool = prim::Bool(%3)
%r : int = prim::If(%4)
block0() {
-> (%5)
}
block1() {
-> (%6)
}
return (%r);
}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2

>>> x = torch.ones(2, 2).add_(1.0)

>>> my_function(x)
1
Control-flow logic was preserved !

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?
É The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?
É The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;

É This opens the door to:

É Decouple the model (computationl graph) from Python runtime;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?
É The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;

É This opens the door to:

É Decouple the model (computationl graph) from Python runtime;

É Use it in production with C++ (no GIL) or other languages;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?
É The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;

É This opens the door to:

É Decouple the model (computationl graph) from Python runtime;

É Use it in production with C++ (no GIL) or other languages;

É Capitalize on optimizations (whole program);

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?
É The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;

É This opens the door to:

É Decouple the model (computationl graph) from Python runtime;

É Use it in production with C++ (no GIL) or other languages;

É Capitalize on optimizations (whole program);

É Split the development world of hackable and easy to debug from

the world of putting these models in production and optimize
them.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

BUILDING THE IR
To build the IR, PyTorch takes leverage of the Python Abstract
Syntax Tree (AST) which is a tree representation of the syntactic
structure of the source code.
>>> ast_mod = ast.parse("print(1 + 2)")
>>> astpretty.pprint(ast_mod.body[0], show_offsets=False)

Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
BinOp(
left=Num(n=1),
op=Add(),
right=Num(n=2),
),
],
keywords=[],
),
)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

BUILDING THE IR
print(1 + 2)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PY TORCH JIT PHASES

! Parsing " Checking # Optimization

& or ' $ Translation ○ Execution

AST Code

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

EXECUTING
Just like Python interpreter executes your code, PyTorch has a
interpreter that executes the IR instructions:
bool runImpl(Stack& stack) {
auto& instructions = function->instructions;
size_t last = instructions.size();

while (pc < last) {

auto& inst = instructions[pc];
try {
loadTensorsFromRegisters(inst.inputs, stack);
size_t new_pc = pc + 1 + inst.callback(stack);
for (int i = inst.outputs.size - 1; i >= 0; --i) {
int reg = get(inst.outputs, i);
registers[reg] = pop(stack);
}
pc = new_pc;

// (...) omitted
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

OPTIMIZATIONS
Many optimizations can be used on the computational graph of the
model, such as Loop Unrolling:

for i.. i+= 1 for i.. i+= 4

for j.. for j..
code(i, j) code(i, j)
code(i+1, j)
code(i+2, j)
code(i+3, j)
remainder loop

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

OPTIMIZATIONS
Also Peephole optimizations such as:

x.t().t() = x

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

OPTIMIZATIONS
Also Peephole optimizations such as:

x.t().t() = x

Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

OPTIMIZATIONS
Also Peephole optimizations such as:

x.t().t() = x

Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}

Other optimizations include Constant Propagation, Dead Code

Elimination (DCE), fusion, inlining, etc.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data
$ unzip resnet.pt
Archive: resnet.pt
extracting: resnet/version
extracting: resnet/code/resnet.py
extracting: resnet/model.json
extracting: resnet/tensors/0
(...)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

model.json
{"parameters":
[{ "isBuffer": false,
"tensorId": "1",
"name": "weight" }],
"name": "conv1",
"optimize": true}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

model.json model.json
{"parameters": [{"isBuffer": true,
[{ "isBuffer": false, "tensorId": "4",
"tensorId": "1", "name": "running_mean"},
{"isBuffer": true,
"name": "weight" }], "tensorId": "5",
"name": "conv1", "name": "running_var"}],
"optimize": true} "name": "bn1",
"optimize": true}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

USING THE MODEL IN C++

PyTorch also has a C++ API that you can use to load/train models in
C++. This is good for production, mobile, embedded devices, etc.

Example of loading a traced model in PyTorch C++ API:

#include <torch/script.h>
int main(int argc, const char* argv[])
{
auto module = torch::jit::load("resnet.pt");
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));
at::Tensor output = module->forward(inputs).toTensor();
}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

USING THE MODEL IN NODEJS

Complete tutorial at https://goo.gl/7wMJuS.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Section III

[ PRODUCTION \

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ISSUES WITH TUTORIALS

É Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
É They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ISSUES WITH TUTORIALS

É They don’t pay attention to zero-copy practices, so they often

transform, reshape, convert to numpy, convert to PyTorch, etc;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ISSUES WITH TUTORIALS

É They don’t pay attention to zero-copy practices, so they often

transform, reshape, convert to numpy, convert to PyTorch, etc;

É They often use HTTP/1;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ISSUES WITH TUTORIALS

É They don’t pay attention to zero-copy practices, so they often

transform, reshape, convert to numpy, convert to PyTorch, etc;

É They often use HTTP/1;

É They seldom do batching (important for GPUs);

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ISSUES WITH TUTORIALS

É They don’t pay attention to zero-copy practices, so they often

transform, reshape, convert to numpy, convert to PyTorch, etc;

É They often use HTTP/1;

É They seldom do batching (important for GPUs);

É They never put that "production" code in production.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

PREFER BINARY SERIALIZATION FORMATS

Prefer using good binary serialization methods such as Protobuf
that offers a schema and a schema evolution mechanism.

Example from EuclidesDB RPC message:

message AddImageRequest {
int32 image_id = 1;
bytes image_data = 2;
// This field can encode JSON data
bytes image_metadata = 3;
repeated string models = 4;
}

* http://euclidesdb.readthedocs.io PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

AVOID EXTRA COPIES

É Be careful to avoid extra copies of your tensors, especially during
pre-processing;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

AVOID EXTRA COPIES

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

AVOID EXTRA COPIES

É Be careful to avoid extra copies of your tensors, especially during
pre-processing;
É You can use in-place operations. It is a functional anti-pattern
because it introduces side-effects, but it’s a fair price to pay for
performance;
É Caveat: in-place operations doesn’t make much sense when you
need gradients. PyTorch uses tensor versioning to catch that:
>>> a = torch.tensor(1.0, requires_grad=True)
>>> y = a.tanh()
>>> y.add_(2.0)
>>> y.backward() # error !
>>> a._version
0
>>> y._version
1
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0
Client Server
Time

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0 HTTP 1.1 - Pipelining
Client Server Client Server

Time
Time

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0 HTTP 1.1 - Pipelining HTTP 1.1 - HoL
Client Server Client Server Client Server

Time

Time
Time

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0 HTTP 1.1 - Pipelining HTTP 1.1 - HoL HTTP 2.0 - Multiplexing
Client Server Client Server Client Server Client Server

Time

Time
Time

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0 HTTP 1.1 - Pipelining HTTP 1.1 - HoL HTTP 2.0 - Multiplexing
Client Server Client Server Client Server Client Server

Time

Time
Time

É Use HTTP 2.0 if possible, and avoid the head-of-line blocking;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

A TALE OF TWO HTTPS

HTTP 1.0 HTTP 1.1 - Pipelining HTTP 1.1 - HoL HTTP 2.0 - Multiplexing
Client Server Client Server Client Server Client Server

Time

Time
Time

É Use HTTP 2.0 if possible, and avoid the head-of-line blocking;

É Even better, you can use frameworks such as gRPC that uses
HTTP/2.0 and Protobuf.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

BATCHING
Batching data is a way to amortize the performance bottleneck.

Non-batching
Requests GPU

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

BATCHING
Batching data is a way to amortize the performance bottleneck.

Non-batching
Requests GPU

Requests Batching
GPU
Batch 2 Batch 1

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Section IV

[ Q&A \

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Q&A

Thanks !

PyTorch under the hood - Christian S. Perone (2019)

Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
Tutorials
No ratings yet
Tutorials
17 pages
Driving Book - Tieng Viet
No ratings yet
Driving Book - Tieng Viet
144 pages
Data Virtuality Best Practices
No ratings yet
Data Virtuality Best Practices
18 pages
Tensors
No ratings yet
Tensors
12 pages
Pytorch Fundamentals
No ratings yet
Pytorch Fundamentals
23 pages
WWW Learnpytorch
No ratings yet
WWW Learnpytorch
14 pages
PyTorch Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
No ratings yet
PyTorch Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
45 pages
Pytorch For Beginners
No ratings yet
Pytorch For Beginners
13 pages
00 Pytorch Fundamentals - Ipynb - Colab
No ratings yet
00 Pytorch Fundamentals - Ipynb - Colab
24 pages
L2 Tensors Multidimensional Arrays
No ratings yet
L2 Tensors Multidimensional Arrays
48 pages
PyTorch
No ratings yet
PyTorch
5 pages
Lab 5
No ratings yet
Lab 5
27 pages
Pytorch: Tensors and Datasets
No ratings yet
Pytorch: Tensors and Datasets
9 pages
Pytorch Basics - For Absolute Beginners - Sel, Tam (Sel, Tam) - 2021 - Anna's Archive - Copie
No ratings yet
Pytorch Basics - For Absolute Beginners - Sel, Tam (Sel, Tam) - 2021 - Anna's Archive - Copie
62 pages
DIP Lab 10
No ratings yet
DIP Lab 10
11 pages
PyTorch PDF
No ratings yet
PyTorch PDF
72 pages
Warmup Guide To Pytorch
No ratings yet
Warmup Guide To Pytorch
5 pages
PyTorch For Machine Learning
No ratings yet
PyTorch For Machine Learning
5 pages
Pytorch
No ratings yet
Pytorch
38 pages
unit-4-part-3
No ratings yet
unit-4-part-3
8 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
35 pages
Harvard CS197 Lecture 6 & 7 Notes
No ratings yet
Harvard CS197 Lecture 6 & 7 Notes
18 pages
4. Copy of 03_Building_Your_First_Dataset.ipynb - Colab
No ratings yet
4. Copy of 03_Building_Your_First_Dataset.ipynb - Colab
46 pages
In5400 Week4 2020 Pytorch Lecture4
100% (1)
In5400 Week4 2020 Pytorch Lecture4
84 pages
03_Building_Your_First_Dataset.ipynb - Colab
No ratings yet
03_Building_Your_First_Dataset.ipynb - Colab
42 pages
Recreating PyTorch From Scratch (With GPU Support and Automatic Differentiation)
No ratings yet
Recreating PyTorch From Scratch (With GPU Support and Automatic Differentiation)
35 pages
PyTorch 2_ Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation - Pytorch2-2
No ratings yet
PyTorch 2_ Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation - Pytorch2-2
3 pages
CS236 Introduction To PyTorch
100% (4)
CS236 Introduction To PyTorch
33 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
PyTorch_CrashCourse
No ratings yet
PyTorch_CrashCourse
17 pages
Deep Learning With PyTorch: Object Classification - Filliat Et Al
No ratings yet
Deep Learning With PyTorch: Object Classification - Filliat Et Al
3 pages
PyTorch - Basic Operations
No ratings yet
PyTorch - Basic Operations
20 pages
Ibrahim Badhusha
No ratings yet
Ibrahim Badhusha
37 pages
Deep Learning Lab Manual_Week 1-10
No ratings yet
Deep Learning Lab Manual_Week 1-10
81 pages
Chapter1 Intro
No ratings yet
Chapter1 Intro
35 pages
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
No ratings yet
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
19 pages
1. Introduction to PyTorch
No ratings yet
1. Introduction to PyTorch
9 pages
1. Introduction to PyTorch
No ratings yet
1. Introduction to PyTorch
9 pages
pytorch
No ratings yet
pytorch
2 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
Py Torch
50% (2)
Py Torch
23 pages
lec-3
No ratings yet
lec-3
30 pages
A Brief Introduction To Pytorch: (A Deep Learning Library)
No ratings yet
A Brief Introduction To Pytorch: (A Deep Learning Library)
32 pages
vertopal.com_PyTorch_CrashCourse
No ratings yet
vertopal.com_PyTorch_CrashCourse
16 pages
unit 4 part 3 dl_1
No ratings yet
unit 4 part 3 dl_1
5 pages
Appendix Tensorflow PDF
50% (8)
Appendix Tensorflow PDF
14 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
PyTorch Cheat Sheet
No ratings yet
PyTorch Cheat Sheet
2 pages
1 TensorFlow
No ratings yet
1 TensorFlow
66 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
Deep Learning Library PDF
No ratings yet
Deep Learning Library PDF
12 pages
Pytorch Cheat Sheet For Beginners and Udacity Deep Learning Nanodegree
No ratings yet
Pytorch Cheat Sheet For Beginners and Udacity Deep Learning Nanodegree
23 pages
Deep Learning Unit 4
No ratings yet
Deep Learning Unit 4
11 pages
Pytorch Paper
No ratings yet
Pytorch Paper
12 pages
ML Libraries Ppt(3.3)
No ratings yet
ML Libraries Ppt(3.3)
10 pages
Tensorflow
No ratings yet
Tensorflow
22 pages
S06_DNN_Tensorflow_PyTorch_wip
No ratings yet
S06_DNN_Tensorflow_PyTorch_wip
24 pages
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Python Simplified
From Everand
Python Simplified
Alisa Turing
No ratings yet
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Asking For Help - How Do You Protect Python Source Code - Python Wiki
No ratings yet
Asking For Help - How Do You Protect Python Source Code - Python Wiki
3 pages
Thuy Trang Nguyen: Write A Post..
No ratings yet
Thuy Trang Nguyen: Write A Post..
4 pages
TIGER Instruction Manual JKW A PDF
No ratings yet
TIGER Instruction Manual JKW A PDF
106 pages
Big-O Algorithm Complexity Cheat Sheet (Know Thy Complexities!) @ericdrowell
No ratings yet
Big-O Algorithm Complexity Cheat Sheet (Know Thy Complexities!) @ericdrowell
9 pages
T-Shaped Skills Builder Guide in 2020 For End-To-End Data Scientist
No ratings yet
T-Shaped Skills Builder Guide in 2020 For End-To-End Data Scientist
11 pages
Giao Trinh Pythong Eng
No ratings yet
Giao Trinh Pythong Eng
436 pages
Influence Maps
No ratings yet
Influence Maps
111 pages
Defect Tracking Interview Questions
No ratings yet
Defect Tracking Interview Questions
4 pages
Dimensional Modeling in Data Warehousing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Dimensional Modeling in Data Warehousing: Bachelor of Technology Computer Science and Engineering
15 pages
DR AF T DR AF T: Business Analyst As Scrum Product Owner
No ratings yet
DR AF T DR AF T: Business Analyst As Scrum Product Owner
42 pages
Oracle Installation Steps
No ratings yet
Oracle Installation Steps
4 pages
CPT Project Report Adithya
No ratings yet
CPT Project Report Adithya
20 pages
Vision Document: For Hotel Reservation System (HRS)
No ratings yet
Vision Document: For Hotel Reservation System (HRS)
7 pages
Marking Scheme Computer Science (Code: 083)
No ratings yet
Marking Scheme Computer Science (Code: 083)
6 pages
Cloud Computing Assignment-Week 1 Type of Question: MCQ/MSQ Number of Questions: 10 Total Mark: 10 X 1 10
No ratings yet
Cloud Computing Assignment-Week 1 Type of Question: MCQ/MSQ Number of Questions: 10 Total Mark: 10 X 1 10
4 pages
FS - Change in ZSUMMARY Report
100% (1)
FS - Change in ZSUMMARY Report
5 pages
HP Service Manager Web Services
No ratings yet
HP Service Manager Web Services
179 pages
Agile Methodology
No ratings yet
Agile Methodology
46 pages
SSQ Lec 8
No ratings yet
SSQ Lec 8
58 pages
HSM Userguide
No ratings yet
HSM Userguide
128 pages
7 From Prof. Dullea: CSC8490 Introduction To PL/SQL
No ratings yet
7 From Prof. Dullea: CSC8490 Introduction To PL/SQL
36 pages
Ooad QB
No ratings yet
Ooad QB
11 pages
Database Management System Lab Course Code: CSE 312: Instructor: Tasnim Tarannum Lecturer CSE Department
No ratings yet
Database Management System Lab Course Code: CSE 312: Instructor: Tasnim Tarannum Lecturer CSE Department
15 pages
Bartender Commander Examples
No ratings yet
Bartender Commander Examples
19 pages
Borland Samples: Mark Smallwood
No ratings yet
Borland Samples: Mark Smallwood
130 pages
Exam Instructions CB1 and CB2
No ratings yet
Exam Instructions CB1 and CB2
1 page
EMC VMAX3 - CLI Cheat Sheet - David Ring
No ratings yet
EMC VMAX3 - CLI Cheat Sheet - David Ring
8 pages
ShivanshSrivastava Resume
No ratings yet
ShivanshSrivastava Resume
1 page
RedHat PracticeTest EX200 v2018-10-10 by Kamari 33q PDF
No ratings yet
RedHat PracticeTest EX200 v2018-10-10 by Kamari 33q PDF
22 pages
TY BSC (CS) SemV Java Practical Slips-Oct2022
100% (2)
TY BSC (CS) SemV Java Practical Slips-Oct2022
14 pages
Seminar On: 5G Mobile Wireless Network - Security & Challenges
No ratings yet
Seminar On: 5G Mobile Wireless Network - Security & Challenges
28 pages
Secure, Resilient, And Agile Software Development by Mark Merkow · 2019
No ratings yet
Secure, Resilient, And Agile Software Development by Mark Merkow · 2019
239 pages
Confirm Elicitation Results
No ratings yet
Confirm Elicitation Results
2 pages
11 - Chapter 6 - Queries Involving More Than One Relation
No ratings yet
11 - Chapter 6 - Queries Involving More Than One Relation
29 pages
Instructions - FAQ's - Onsite Coding Test
No ratings yet
Instructions - FAQ's - Onsite Coding Test
6 pages
Scan Report
No ratings yet
Scan Report
29 pages