BD 10 Tensorflow
BD 10 Tensorflow
Lars Schmidt-Thieme
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 1 / 31
Big Data Analytics
Syllabus
Tue. 9.4. (1) 0. Introduction
A. Parallel Computing
Tue. 16.4. (2) A.1 Threads
Tue. 23.4. (3) A.2 Message Passing Interface (MPI)
Tue. 30.4. (4) A.3 Graphical Processing Units (GPUs)
B. Distributed Storage
Tue. 7.5. (5) B.1 Distributed File Systems
Tue. 14.5. (6) B.2 Partioning of Relational Databases
Tue. 21.5. (7) B.3 NoSQL Databases
C. Distributed Computing Environments
Tue. 28.5. (8) C.1 Map-Reduce
Tue. 4.6. (9) C.2 Resilient Distributed Datasets (Spark)
Tue. 11.6. — — Pentecoste Break —
Tue. 18.6. (10) C.3 Computational Graphs (TensorFlow)
D. Distributed Machine Learning Algorithms
Tue. 25.6. (11) D.1 Distributed Stochastic Gradient Descent
Tue. 2.7. (12) D.2 Distributed Matrix Factorization
Tue. 9.7. (13) Questions and Answers
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 1 / 31
Big Data Analytics
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 1 / 31
Big Data Analytics 1. The Computational Graph
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 1 / 31
Big Data Analytics 1. The Computational Graph
TensorFlow
I Computational framework
I multi-device, distributed
I Core in C/C++, standard interface in Python
I several further language bindings, e.g., R, Java
I open source
I developed by Google
I initially released Nov. 2015
I 2nd generation framework
I 1st generation framework was called DistBelief
I alternative: pytorch
I developed by facebook, since 2016, open source
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 1 / 31
Big Data Analytics 1. The Computational Graph
Tensors
I tensor = multidimensional array
I rank = number of dimensions
I shape = vector of sizes,
one size for each dimension.
Computational Graphs
I TensorFlow organizes a computation as a directed graph.
I Nodes represent a tensor.
I or a list of tensors.
Sessions
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 4 / 31
Big Data Analytics 1. The Computational Graph
Two Phases
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 5 / 31
Big Data Analytics 1. The Computational Graph
1 import tensorflow as tf
2
3 a = tf.constant(3.0)
4 b = tf.constant(4.0) Output:
5 x = tf.add(a, b)
6 1 Tensor("Const:0", shape=(), dtype=float32)
7 print(a) 2 Tensor("Const_1:0", shape=(), dtype=float32)
8 print(b) 3 Tensor("Add:0", shape=(), dtype=float32)
9 print(x) 4 7
10
11 sess = tf.Session()
12 x_val = sess.run(x)
13 print(x_val)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 6 / 31
Big Data Analytics 1. The Computational Graph
1 import tensorflow as tf
2
3 a = tf.constant([3.0, -2.7, 1.2])
4 b = tf.constant([4.0, 5.1, -1.7]) Output:
5 x = tf.add(a, b)
6 1 Tensor("Const_2:0", shape=(3,), dtype=float32)
7 print(a) 2 Tensor("Const_3:0", shape=(3,), dtype=float32)
8 print(b) 3 Tensor("Add_1:0", shape=(3,), dtype=float32)
9 print(x) 4 [ 7.0 2.4 -0.5 ]
10
11 sess = tf.Session()
12 x_val = sess.run(x)
13 print(x_val)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 7 / 31
Big Data Analytics 1. The Computational Graph
Tensor Types
I Different element types are represented by tf.DType:
dtype description
tf.float16 16-bit half-precision floating-point
tf.float32 32-bit single-precision floating-point
tf.float64 64-bit double-precision floating-point
tf.bfloat16 16-bit truncated floating-point
tf.complex64 64-bit single-precision complex
tf.complex128 128-bit double-precision complex
tf.int8 8-bit signed integer
tf.uint8 8-bit unsigned integer
tf.uint16 16-bit unsigned integer
tf.int16 16-bit signed integer
tf.int32 32-bit signed integer
tf.int64 64-bit signed integer
tf.bool Boolean
tf.string String
tf.qint8 Quantized 8-bit signed integer
tf.quint8 Quantized 8-bit unsigned integer
tf.qint16 Quantized 16-bit signed integer
tf.quint16 Quantized 16-bit unsigned integer
tf.qint32 Quantized 32-bit signed integer
tf.resource Handle to a mutable resource
1 import tensorflow as tf
2
3 a = tf.constant(3.0)
4 b = tf.constant(4.0) Output:
5 x = a + b
6 1 Tensor("add_1:0", shape=(), dtype=float32)
7 print(x) 2 7
8
9 sess = tf.Session()
10 x_val = sess.run(x)
11 print(x_val)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 9 / 31
Big Data Analytics 2. Variables
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 10 / 31
Big Data Analytics 2. Variables
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 10 / 31
Big Data Analytics 2. Variables
Variables
1 import tensorflow as tf
2
3 x = tf.Variable(3.0) Output:
4 sess = tf.Session()
1 3.0
5 sess.run( x.initializer )
2
6 x_val = sess.run(x)
3 0 4.0
7 print(x_val)
4 1 5.0
8
5 2 6.0
9 x_plus_one = tf.assign_add(x, 1.0)
6 3 7.0
10
7 4 8.0
11 for t in range(5):
12 x_val = sess.run(x_plus_one)
13 print(t, x_val)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 11 / 31
Big Data Analytics 2. Variables
1 import tensorflow as tf
2
3 x = tf.Variable(3.0)
4 y = tf.Variable( tf.multiply(tf.constant(2.0), Output:
5 x.initialized_value()) )
6 init = tf.global_variables_initializer() 1 6.0
7 sess = tf.Session()
8 sess.run(init)
9 y_val = sess.run(y)
10 print(y_val)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 12 / 31
Big Data Analytics 3. Example: Linear Regression
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 13 / 31
Big Data Analytics 3. Example: Linear Regression
ŷ := β0 + X β prediction
r := y − ŷ residuum
N
1 X
` := rn2 loss/error
2
n=1
∂`
− := X T r negative gradient w.r.t. β
∂β
N
∂` X
− := rn negative gradient w.r.t. β0
∂β0
n=1
∂`
β next := β − η update of β
∂β
∂`
β0next := β0 − η update of β0
∂β0
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 13 / 31
Big Data Analytics 3. Example: Linear Regression
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 14 / 31
Big Data Analytics 3. Example: Linear Regression
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 15 / 31
Big Data Analytics 4. Automatic Gradients
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 16 / 31
Big Data Analytics 4. Automatic Gradients
tf.gradients(ys, xs, . . .)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 16 / 31
Big Data Analytics 4. Automatic Gradients
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 17 / 31
Big Data Analytics 4. Automatic Gradients
∂y
to compute ∂x :
I find all paths p 1 , . . . , p K ∈ G ∗ in the graph G from x to y
∂o
I each operation plk =: o has to provide its gradient ∂i for each of its
inputs i.
∂plk ∂o k
I then ∂plk−1
= ∂i for i = pl−1 .
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 18 / 31
Big Data Analytics 4. Automatic Gradients
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 19 / 31
Big Data Analytics 4. Automatic Gradients
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 20 / 31
Big Data Analytics 5. Large Data I: Feeding
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 21 / 31
Big Data Analytics 5. Large Data I: Feeding
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 21 / 31
Big Data Analytics 5. Large Data I: Feeding
1 import tensorflow as tf
2
3 a = tf.placeholder(shape=(), dtype=tf.float32) Output:
4 b = tf.placeholder(shape=(), dtype=tf.float32)
5 x = a + b 1 10.0
6 2 2.0
7 sess = tf.Session()
8 print( sess.run(x, {a: 3, b: 7}) )
9 print( sess.run(x, {a: -2, b: 4}) )
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 22 / 31
Big Data Analytics 5. Large Data I: Feeding
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 24 / 31
Big Data Analytics 6. Large Data II: Reader Nodes
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 24 / 31
Big Data Analytics 6. Large Data II: Reader Nodes
Reader Node
file lr-data.csv:
1 import tensorflow as tf
1 X1, X2, Y
2
2 2, 1, +1
3 data_files = [’lr-data.csv’]; eta_data = 0.01
3 1, 2, +1
4
4 4, 3, -1
5 filename_queue = tf.train.string_input_producer(data_files)
5 3, 4, -1
6 reader = tf.TextLineReader(skip_header_lines=1)
7 _, line = reader.read(filename_queue)
8
9 sess = tf.Session()
10 coord = tf.train.Coordinator()
11 threads = tf.train.start_queue_runners(coord=coord, sess=sess)
12 Output:
13 for t in range(6):
1 0 b’2,␣1,␣+1’
14 line_val = sess.run(line)
2 1 b’1,␣2,␣+1’
15 print(t, line_val)
3 2 b’4,␣3,␣-1’
16
4 3 b’3,␣4,␣-1’
17 coord.request_stop()
5 4 b’2,␣1,␣+1’
18 coord.join(threads)
6 5 b’1,␣2,␣+1’
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 25 / 31
Big Data Analytics 6. Large Data II: Reader Nodes
Outline
2. Variables
4. Automatic Gradients
7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 27 / 31
Big Data Analytics 7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 27 / 31
Big Data Analytics 7. Debugging
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 28 / 31
Big Data Analytics 7. Debugging
Summary (1/3)
I Two phases:
I graph construction
I executing (parts of) the graph (running)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 29 / 31
Big Data Analytics 7. Debugging
Summary (2/3)
I Nodes can be distributed over different devices.
I cores of a CPU, GPUs, different compute nodes
I automatic placement based on cost heuristics
I eligible: sufficient memory available
I expected runtime
— based on cost heuristics
— possibly also based on past runs
I expected time for data movement between devices
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 30 / 31
Big Data Analytics 7. Debugging
Summary (3/3)
I Gradients can be computed automatically.
I simply using the chain rule
I and explicit gradients for all elementary operations.
I gradients add nodes to the graph.
Further Readings
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 32 / 31
Big Data Analytics
References I
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy
Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous
distributed systems. arXiv preprint arXiv:1603.04467, 2016.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
C. Distributed Computing Environments / 3. Computational Graphs (TensorFlow) 33 / 31