DeepLearning MaterialsTextures GDC17 FINAL
DeepLearning MaterialsTextures GDC17 FINAL
Session Description: Recently deep learning has revolutionized computer vision and other
recognition problems. Everyday applications using such techniques are now commonplace
with more advanced tasks being automated at a growing rate. During 2016, “image synthesis”
techniques started to appear that used deep neural networks to apply style transfer
algorithms for image restoration. The speakers review some of these techniques and
demonstrate their application in image magnification to enable “super resolution” tools.
The speakers also discuss recent discoveries by NVIDIA Research that uses AI, machine
learning and deep learning based approaches to greatly improve the process of creating
game-ready materials. Using these novel techniques, artists can use standard DSLR, or even
cell phone cameras, to create full renderable materials in minutes. The session concludes by
showing how developers can integrate these methods into their existing art pipelines.
Takeaway: Attendees will gain information about the latest application of machine and deep
learning for content creation and get access to new resources to improve their work.
Intended Audience: Texture artists, art directors, tool programmers, anyone interested in
latest evolution of deep learning in game development.
1
Overview
Welcome
What is Deep Learning?
“GameWorks: Materials & Textures” [producers and artists rejoice]
Examine in detail the design of one tool [coders bathe in technical details]
Wrap up
gameworks.nvidia.com 2
2
Deep Learning – What is it?
AI vs ML vs DL - great explanation https://goo.gl/hkayWG
Why now?
Better algorithms
Large datasets
Machine Learning at its most basic is the practice of using algorithms to parse data, learn
from it, and then make a determination or prediction about something in the world. So rather
than hand-coding software routines with a specific set of instructions to accomplish a
particular task, the machine is “trained” using large amounts of data and algorithms that give
it the ability to learn how to perform the task.
One approach to ML was “artificial neural networks” – basically use “simple” math in a
distributed way to try and mimic the way we think neurons in the brain work. Anyway, for
years ANN resulted in nothing until:
Prof Hinton @ Uni of Toronto made the algorithms parallel, and then the algorithms were put
on GPU. Then training sets exploded.
3
Deep Learning is Ready For Use
Already many ways to use deep learning today Just In!
Baidu DeepVoice
Chat bots
Check services from Google, AWS, Azure if you don’t “roll your own”
gameworks.nvidia.com 4
4
Deep Learning for Art Right Now
Style transfer
Generative networks creating images and voxels
Adversarial networks (DCGAN) – still early but promising
Artomatix
Allegorithmic
Autodesk
gameworks.nvidia.com 5
5
Style Transfer: Something Fun!
Doodle a masterpiece!
Sept 2015: A Neural Algorithm of Artistic Style
by Gatys et al
Content Style
Uses CNN to take the “style” from one image and
apply it to another
References:
A Neural Algorithm of Artistic Style paper by Leon A. Gatys, Alexander S. Ecker, and
Matthias Bethge
Services:
http://ostagram.ru/static_pages/lenta
https://www.instapainting.com/ai-painter
iOS app (calls out to server) http://prisma-ai.com/
6
HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA gameworks.nvidia.com 7
But in addition to being a great toy, there is great potential – I mean, the AI is
actually drawing pixels in a meaningful way.
Style Transfer: Something Useful
Game remaster & texture enhancement
Try Neural Style and use a real-world photo for the “style”
gameworks.nvidia.com 8
8
NVIDIA’s Goals for DL in Game Development
Looking at all the research, clearly there’s scope for tools based on DL
Goals:
Expand the use of deep learning into content creation
gameworks.nvidia.com 9
9
“GameWorks: Materials & Textures”
Set of tools targeting the game industry using machine learning and deep learning
https://gwmt.nvidia.com
Super-resolution
Texture Multiplier
gameworks.nvidia.com 10
10
GameWorks: Materials & Textures beta
Tools run as a web service
Sign up for the Beta at: https://gwmt.nvidia.com
Seeking feedback from artists on usage of tools and quality
Also interested in feedback from programmers on automation, pipeline and
engine integration
gameworks.nvidia.com 11
11
Photo To Material: 2Shot
From two photos of a surface, generate a “material”
Based on a SIGGRAPH 2015 paper by NVResearch and Aalto University (Finland)
“Two-Shot SVBRDF Capture for Stationary Materials”
https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/
Or align later
gameworks.nvidia.com 12
12
Material Synthesis from Two Photos
Diffuse
Specular Normals Glossiness Anisotropy
albedo
gameworks.nvidia.com 13
13
Material Synthesis Process
gameworks.nvidia.com 14
14
Demo
Photo To Material: 2Shot
15
Photo To Material: 1Shot
What’s better than two photos? One!
SIGGRAPH 2016 paper by NVResearch and Aalto University (Finland)
“Reflectance modeling by neural texture synthesis”
http://dl.acm.org/citation.cfm?id=2925917&preflayout=flat
gameworks.nvidia.com 16
16
1shot – EARLY Previews
gameworks.nvidia.com 17
17
Texture Multiplier
Put simply: texture in, new texture out
Inspired by Gatys et al
Texture Synthesis Using Convolutional Neural Networks
https://arxiv.org/pdf/1505.07376.pdf
Artomatix
Similar product “Texture Mutation”
https://artomatix.com/
gameworks.nvidia.com 18
Currently “Beta”
Some artifacts – 256x256 now, with 512 and 1024 coming
18
Super Resolution
Final tool in the first roll-out of GameWorks: Materials & Textures
Introduce Dmitry and Marco
Deep dive on the tool and to explain some recent DL based research and techniques
gameworks.nvidia.com 19
19
Zoom! Enhance!
Yes
Sure!
gameworks.nvidia.com 20
20
Super-resolution: the task
Constructed
high-resolution image
Given
low-resolution image
H Upscaling n*H
n*W
gameworks.nvidia.com 21
The task is to “generate” a bigger image from a smaller one. If we want to use
machine learning to do this, we can create two set, one of big images and one of
their downscaled version, and train our system with these two sets
21
Super-resolution as reconstruction task
Unknown original
high-resolution image Reconstructed image
Given image
Downscaling Reconstruction
22
Super-resolution: ill-posed task
Pixels of the Pixels of the
original image reconstructed image
Pixels of the
given image
? ? ?
? ? ? ? ? ?
Downscaling Reconstruction ? ? ?
? ? ? ? ? ?
Information ? ? ?
is lost here ? ? ? ? ? ?
gameworks.nvidia.com 23
But the problem is ill-posed. We first remove some information, and then try to
reconstruct the image using less data (1/4, in this case, 1/n^2 for a downscale factor
n)
23
Super-resolution: ill-posed task
Pixels of the Pixels of the
original image reconstructed image
Pixels of the
given image
? ? ?
? ? ? ? ? ?
Downscaling Reconstruction ? ? ?
? ? ? ? ? ?
Information ? ? ?
is lost here ? ? ? ? ? ?
gameworks.nvidia.com 24
24
Super-resolution: ill-posed task
OR DO YOU?
gameworks.nvidia.com 25
25
Where does the magic come from?
•Let’s consider 8x8 patch of some 8-bit grayscale image
•How many of such patches are there?
gameworks.nvidia.com 26
Let’s consider a small portion of the original image, say 8x8 patch, and let’s consider
a single channel of 8 bit.
26
Where does the magic come from?
•Let’s consider 8x8 patch of some 8-bit grayscale image
•How many of such patches are there?
N = 256(8∗8) ≈ 10153
gameworks.nvidia.com 27
The number of possible values for the pixel is 256, and the number of pixels is
8x8=64, so the total number of possible images is quite big
27
Where does the magic come from?
•Let’s consider 8x8 patch of some 8-bit grayscale image
•How many of such patches are there?
N = 256(8∗8) ≈ 10153
•More than the number of atoms in observable universe
That’s actually more atoms than the observable universe, maybe an image contains
less information than this.
28
Where does the magic come from?
Photos
Natural images
Textures
Indeed, among the possible images, photos and textures are a very small subset
29
Super-resolution under constraints
•Data from the natural images is sparse or compressible in some domain
•To reconstruct such images some prior information or constraints are required
Downscaling Reconstruction
+
prior information
+
constraints
gameworks.nvidia.com 30
If we constrain our problem to deal with natural images and textures, we can be
enhance the content without much loss
30
Hand-crafted constraints and priors
•Interpolation (bicubic, lanczos, etc.)
•Interpolation + Sharpening (and other filtration)
gameworks.nvidia.com 31
One possible option is to construct an upscaling method taking some a priori decisions
on the resulting image (e.g. sharpness)
This will work in some cases, but in general will require a lot manual of work to
handmake the upscaling logics into our algorithm
We need a better method, something that looks into images from our specific domain
and finds which are the interesting features.
These methods are usually machine learning methods
31
Super-resolution: machine learning
Idea: use machine learning to capture prior knowledge and statistics from the data
Mathematical
optimization
Machine
Computer
science learning
Statistics
gameworks.nvidia.com 32
The idea is to exploit prior knowledge about our image domain. We can gather such
information using machine learning. Since the machine learning is a technique of
building intelligence systems, which are not explicitly programmed, but trained using
an error minimization to capture and exploit internal structure and features of the
training data automatically.
32
Patch-based mapping
Low-resolution patch Mapping High-resolution patch
Model
params
gameworks.nvidia.com 33
Let's reduce our task to a simpler one: transformation of an image patch. Let's
consider a mapping function, which constructs high-resolution patch by a given low-
resolution patch from the input image. Such mapping function will depend on a set of
parameters, which we want to find using machine learning.
33
Patch-based mapping: training
Low-resolution patch Mapping High-resolution patch
Model
params
LR,HR
Training images pairs of patches
gameworks.nvidia.com 34
34
Patch-based mapping: training
Low-resolution patch Mapping High-resolution patch
Model
params
training
LR,HR
Training images pairs of patches
gameworks.nvidia.com 35
35
Patch-based mapping: training
Low-resolution patch Mapping High-resolution patch
Model
params
training
LR,HR
Training images pairs of patches
gameworks.nvidia.com 36
after which we expect that our model will be capable to predict high-resolution
patch in the most optimal way.
36
Patch-based mapping
𝒙𝑯
𝒙𝑳
Encode Decode
LR patch
HR patch
gameworks.nvidia.com 37
A good way to build the mapping function is to use an encoding of an input patch into
some intermediate scale-invariant representation, which will carry some semantic
information about the patch.
37
Patch-based mapping: sparse coding
𝒙𝑯
𝒙𝑳
Encode Decode
LR patch
HR patch
Sparse
code
gameworks.nvidia.com 38
One way to build such representation is sparse coding. Here we exploit our prior
knowledge, that our signal is sparse in some domain.
38
Sparse coding and dictionary learning
•Image patch could be presented as a sparse linear combination of dictionary elements
•Dictionary is learned from the data (in contrast to hand-crafted dictionary like DCT)
𝑫
𝒙 = 𝑫𝒛 = 𝒅𝟏 𝒛𝟏 + ⋯ + 𝒅𝑲 𝒛𝑲
𝑫 - dictionary
𝒙 - patch
= 0.8 * + 0.3 * + 0.5 *
𝒛 - sparse code
𝒙 𝒅𝟑𝟔 𝒅𝟒𝟐 𝒅𝟔𝟑
gameworks.nvidia.com 39
39
Patch-based mapping via sparse coding
Mapping
𝒙𝑳
LR patch
gameworks.nvidia.com 40
40
Patch-based mapping via sparse coding
Mapping
𝒙𝑳
𝒛 = 𝒂𝒓𝒈𝒎𝒊𝒏 𝑫𝑳 𝒛 − 𝒙𝑳 + 𝜸 𝒛 𝟎
Encode
LR patch
𝑫𝑳
Sparse
code
LR dictionary
gameworks.nvidia.com 41
41
Patch-based mapping via sparse coding
Mapping
𝒙𝑯
𝒙𝑳
𝒛 = 𝒂𝒓𝒈𝒎𝒊𝒏 𝑫𝑳 𝒛 − 𝒙𝑳 + 𝜸 𝒛 𝟎 𝒙𝑯 = 𝑫𝑯 𝒛
Encode Decode
LR patch
𝑫𝑳 𝑫𝑯 HR patch
Sparse
code
LR dictionary
HR dictionary
gameworks.nvidia.com 42
Then, given the sparse codes and high-resolution dictionary, we perform decoding,
simply calculating the linear combination.
42
Patch-based mapping via sparse coding
𝑫𝑳 𝑫𝑯
LR dictionary
HR dictionary
gameworks.nvidia.com 43
43
Generalized patch-based mapping
Mapping in the
Mapping feature space Mapping
LR patch
High-level High-level
representation of representation of HR patch
the LR patch the HR patch
“features”
gameworks.nvidia.com 44
We may generalize the idea and build another mapping function with more complex
internal representation. For example, first map input patch into corresponding high-
level representation. Then perform some transformation in that space. And then map
resulting high-level representation back to image space -- to high-resolution patch.
44
Generalized patch-based mapping
Mapping in the
Mapping feature space Mapping
𝑊1 𝑊2 𝑊3
LR patch
HR patch
Trainable parameters
gameworks.nvidia.com 45
All transformations depend on some parameters, which we adjust during the training.
This could be a neural net, for example.
45
Mapping of the whole image: using convolution
Convolutional operators
HR image
LR image
gameworks.nvidia.com 46
Now let's recall that we actually want to do a super-resolution for the whole image.
In this case, we can apply our patch-based transformation to the set of all
overlapping patches on the input image, and then assemble resulting high-resolution
patches into high-resolution output. These operations could be implemented via a
convolutional operator. And presented structure is very similar to one well-known
type of neural networks -- auto-encoders.
46
Auto-encoder
gameworks.nvidia.com 47
What’s an Auto-Encoder?
It’s a neural network trained to reconstruct its input.
What’s difficult is doing it by passing to an internal representation, with less
information (hourglass structure)
47
Auto-encoder
Encode Decode
features
gameworks.nvidia.com 48
An autoencoder network is composed by two parts, an ENCODER which take the input
and converts it to the internal representation (feature space) and a DECODER which
tries to regenerate the input
48
Auto-encoder
parameters
𝑊
Inference
𝑦 = 𝐹𝑊 (𝑥)
𝑥 𝑦 Training
𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )
𝑖
𝑥𝑖 - training set
gameworks.nvidia.com 49
When encoder and decoder are modeled by a DNN, the parameter space is defined by
a set of Weights (W).
In the training we try to minimize a specific loss function (or “distance” between the
input and the output). If there’s enough information in the middle layer + in the prior
knowledge, the reconstruction will be perfect (distance will be 0), if there isn’t
enough information, the network will try to minimize the distance measured on the
training set.
49
Auto-encoder
Encode
Our encoder is LOSSY by definition
input
features
information loss
gameworks.nvidia.com 50
50
Super-resolution auto-encoder
parameters
𝑊
Inference
𝑦 = 𝐹𝑊 (𝑥)
𝑥 𝑦 Training
𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )
𝑖
𝑥𝑖 - training set
gameworks.nvidia.com 51
51
Network topology
Using global information
Fixed-resolution
Better result (?)
gameworks.nvidia.com 52
Using all pixels in the image. Does this means having better results? Maybe.
Using only local information we have less parameters, a scalable network. Does this
mean less quality? Not necessarily, we are using LOCAL information.
52
Super-resolution convolutional auto-encoder
parameters
𝑊 Only use size-independent layers
Convolution
Downscaling
Pooling
Strided convolution
𝑥 𝑦 Upscaling
Data replication
Interpolation
Deconvolution
53
Super-resolution convolutional auto-encoder
Why Downscaling?
Collect multi-scale information
Deeper features
gameworks.nvidia.com 54
54
SRCAE: Overview
In Down … Down Up … Up Out
gameworks.nvidia.com 55
55
SRCAE: Input translation
In Down … Down Up … Up Out
“In” block
Convolution (5x5)
Feature expansion (3->32)
ReLU
gameworks.nvidia.com 56
56
SRCAE: Encoder
In Down … Down Up … Up Out
“Down” block
3x3 Convolution
ReLU
3x3 Convolution
ReLU
3x3 Strided (2x) convolution with feature expansion
ReLU
gameworks.nvidia.com 57
57
SRCAE: Decoder
In Down … Down Up … Up Out
“Up” block
3x3 Convolution
ReLU
3x3 Convolution
ReLU
3x3 Strided (2x) deconvolution with feature reduction
ReLU
gameworks.nvidia.com 58
58
SRCAE: Output
In Down … Down Up … Up Out
gameworks.nvidia.com 59
59
SRCAE: Training
y
𝑥
𝑥ො 𝐹W
𝐷
Downscaling SRCAE
𝑊
LR image
gameworks.nvidia.com 60
60
SRCAE: Inference
y
𝑥ො 𝐹W
SRCAE
𝑊
Given LR image
Constructed HR image
𝑦 = 𝐹𝑊 (𝑥)
ො
gameworks.nvidia.com 61
61
Super-resolution: ill-posed task?
gameworks.nvidia.com 62
62
Distance/Loss function
Distance function is a key element to obtain good results.
𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐷 𝑥𝑖 , 𝐹𝑊 (𝑥𝑖 )
𝑖
gameworks.nvidia.com 63
MSE, L2 and L1 metrics will eventually converge to the results shown before, and
indeed when we started we was using MSE.
We started with MSE, but we obtained better results with another metric.
63
Loss function
MSE
Mean Squared Error
1 2
𝑥 −𝐹 𝑥
𝑁
gameworks.nvidia.com 64
Loss function is important. Generally, people use the MSE loss function, which stands
for mean squared error.
64
Loss function
MSE PSNR
Mean Squared Error Peak Signal-to-Noise Ratio
1 2
2 𝑀𝐴𝑋
𝑥 −𝐹 𝑥 10 ∗ 𝑙𝑜𝑔10
𝑁 𝑀𝑆𝐸
gameworks.nvidia.com 65
65
Loss function: HFEN
MSE PSNR
Mean Squared Error Peak Signal-to-Noise Ratio
1 2
2 𝑀𝐴𝑋
𝑥 −𝐹 𝑥 10 ∗ 𝑙𝑜𝑔10
𝑁 𝑀𝑆𝐸
HFEN*
High Frequency Error Norm High-Pass filter
𝐻𝑃(𝑥 − 𝐹 𝑥 ) 2
Perceptual loss
* http://ieeexplore.ieee.org/document/5617283/ gameworks.nvidia.com 66
66
Perceptual loss
𝑥
𝐺 𝑥
Perceptual
Image features
gameworks.nvidia.com 67
We can generalize this idea. Suppose we have some transformation, that extracts
perceptual features.
67
Perceptual loss
𝑥
𝐺 𝑥 Perceptual features
• High-frequency information
1
• 𝐺 𝑥 = 𝐻𝑃 (𝑥)
𝑁
• CNN features*
• 𝐺 𝑥 = 𝑉𝐺𝐺(𝑥)
Perceptual
• Other
Image features
* https://arxiv.org/abs/1603.08155 gameworks.nvidia.com 68
68
Perceptual loss
𝑥
𝐺 𝑥 Perceptual features
• High-frequency information
1
• 𝐺 𝑥 = 𝐿𝑜𝐺(𝑥)
𝑁
• CNN features*
• 𝐺 𝑥 = 𝑉𝐺𝐺(𝑥)
Perceptual
• Other
Image features
1 2 2
𝐿= 𝑥 − 𝐹(𝑥) + 𝛼 𝐺 𝑥 − 𝐺(𝐹 𝑥 )
𝑁
* https://arxiv.org/abs/1603.08155 gameworks.nvidia.com 69
Then, having a perceptual loss, which is focused on some specific component we can
construct the total loss as a weighted sum of regular content loss and the perceptual
loss.
69
Perceptual loss
𝑥
𝐺 𝑥 Perceptual features
• High-frequency information
1
• 𝐺 𝑥 = 𝐿𝑜𝐺(𝑥)
𝑁
• CNN features*
• 𝐺 𝑥 = 𝑉𝐺𝐺(𝑥)
Perceptual
• Other
Image features
1 2 2 2
𝐿= 𝑥 − 𝐹(𝑥) + 𝛼 𝐺1 𝑥 − 𝐺1 (𝐹 𝑥 ) + 𝛽 𝐺2 𝑥 − 𝐺2 (𝐹 𝑥 ) +…
𝑁
* https://arxiv.org/abs/1603.08155 gameworks.nvidia.com 70
70
Regular loss
Result 4x Result 4x
gameworks.nvidia.com 71
71
Regular loss + Perceptual loss
Result 4x Result 4x
gameworks.nvidia.com 72
And here is the upscaling with the perceptual loss. Edges have become sharper,
aliasing effect is reduced.
72
Demo
Super-Resolution
73
Generative Adversarial Networks
?
Generator Discriminator
Goal Goal
Maximize the error of the Distinguish generated
Discriminator images from real images
gameworks.nvidia.com 74
74
Super-resolution: GAN-based loss
𝐹(𝑥)
real
𝑥
𝑦 𝐷(𝑦)
gameworks.nvidia.com 75
Super-resolution is also a generative task. So, let's try to apply GANs to it. As a
generator let's take our super-resolution auto-encoder, and as a discriminator, let's
train a binary classifier, which will distinguish upscaled and real high-resolution
images.
This will alter the loss function of our autoencoder, and such additional term could
be considered as a special type of perceptual loss.
75
Questions?
Marco Foco, Developer Technology Engineer
Dmitry Korobchenko, Deep Learning R&D Engineer
Andrew Edelsten, Senior Developer Technology Manager
76