0% found this document useful (0 votes)
21 views30 pages

Unit 1

The document provides an overview of deep learning, explaining its foundation in artificial neural networks that mimic the human brain's structure. It discusses key features such as multi-layered networks, feature learning, and applications in various fields, while also introducing essential mathematical concepts like functions, derivatives, and the chain rule used in training neural networks. Additionally, it highlights the importance of feature creation and optimization techniques in enhancing model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Unit 1

The document provides an overview of deep learning, explaining its foundation in artificial neural networks that mimic the human brain's structure. It discusses key features such as multi-layered networks, feature learning, and applications in various fields, while also introducing essential mathematical concepts like functions, derivatives, and the chain rule used in training neural networks. Additionally, it highlights the importance of feature creation and optimization techniques in enhancing model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 1

FOUNDATIONS
Ms Devibala Subramanian
Assistant Professor
PG & Research Department of Computer Science
Sri Ramakrishna College of Arts and Science
Coimbatore
Deep learning is a subset of machine learning that uses artificial neural networks to model and
process complex patterns in data. It is inspired by the human brain's structure and function,
consisting of multiple layers of interconnected neurons. These networks automatically learn
features from data through a process called backpropagation and optimization techniques like
gradient descent.

Key Features of Deep Learning:

Multi-layered Networks: Includes architectures like Convolutional Neural Networks (CNNs) for
image processing and Recurrent Neural Networks (RNNs) for sequential data.

Feature Learning: Extracts relevant patterns from raw data without manual feature engineering.

Scalability: Works efficiently with large datasets and high computational power.

Applications: Used in fields like image recognition, natural language processing (NLP), speech
recognition, autonomous vehicles, and healthcare.

Deep learning has revolutionized AI by enabling advanced capabilities such as real-time language
translation, medical diagnosis, and self-driving technology.
Deep Learning AI mimics the intricate neural networks of the human brain, enabling computers to
autonomously discover patterns and make decisions from vast amounts of unstructured data.

This transformative field has propelled breakthroughs across various domains, from computer
vision and natural language processing to healthcare diagnostics and autonomous driving.

What is Deep Learning?

The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture.

An artificial neural network or ANN uses layers of interconnected nodes called neurons that work
together to process and learn from the input data.

In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other.

Each neuron receives input from the previous layer neurons or the input layer.

The output of one neuron becomes the input to other neurons in the next layer of the network, and
this process continues until the final layer produces the output of the network
The layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input data.

Deep learning AI can be used for supervised, unsupervised as well as reinforcement machine
learning. it uses a variety of ways to process these.
Functions

A function is one of the fundamental building blocks in both mathematics and deep learning.
Understanding functions is crucial because neural networks are essentially large, complex functions
that map inputs to outputs.

A function is a rule or process that takes an input (or multiple inputs), applies some transformation
to it, and produces an output. Mathematically, a function is written as:

• f(x)=yf(x) = yf(x)=y
Where:
• x is the input (also called the independent variable).
• f is the function that transforms x.
• y is the output (also called the dependent variable).
Example Functions:
Simple Squaring Function
• f(x)=x2
• Input: x=3
• Output: f(3)=32=9

ReLU (Rectified Linear Unit) Function


• f(x)=max(0,x)
• Input: x=−5, Output: f(−5)=0
• Input: x=7, Output: f(7)=7

Functions can be represented in three main ways:


• Mathematical Representation (Equations)
• Graphical Representation (Diagrams)
• Computational Representation (Code)
Types of Function Representations

(A) Mathematical Representation

• Mathematically, functions describe relationships between variables. Some common function


types used in deep learning
• Linear Function:
f(x) = ax + b
(B) Graphical Representation
• Functions can be visualized as plots in a coordinate system. This helps understand their
behavior.
(C) Computational Representation (Code)
In Python, functions can be implemented using the def keyword. The Deep Learning from Scratch
book uses NumPy to efficiently handle multi-dimensional data.
Example 1: Simple Squaring Function
import numpy as np

def square(x: np.ndarray) -> np.ndarray:


"""
Computes the square of each element in the input array.
"""
return np.power(x, 2)

x = np.array([1, 2, 3, 4])
print(square(x)) # Output: [ 1 4 9 16 ]
3. Composite (Nested) Functions
• In deep learning, functions are often combined to form nested (or composite) functions.
Mathematically:

def nested_function(x: np.ndarray) -> np.ndarray:


return np.sqrt(np.power(x, 2))

x = np.array([-3, -2, -1, 0, 1, 2, 3])


print(nested_function(x)) # Output: [3 2 1 0 1 2 3]
Functions and Their Derivatives (Gradients)
• To train deep learning models, we need to compute derivatives of functions.

def derivative(func, x: np.ndarray, delta: float = 0.001) ->


np.ndarray:
return (func(x + delta) - func(x - delta)) / (2 * delta)

print(derivative(square, np.array([3]))) # Output: ~6

Why do we need derivatives?


• They help compute gradients, which guide optimization in neural networks.
• The chain rule allows us to compute derivatives for composite functions.
• Gradient Descent uses derivatives to update model parameters.
The chain rule
It is a fundamental concept in calculus that is essential for deep learning. It allows us to compute the
derivative of a function that is composed of multiple functions. Since neural networks consist of
layers of functions applied in sequence, the chain rule plays a crucial role in training them using
backpropagation.
• The chain rule states that if a function f(x) is composed of multiple functions, its derivative can
be computed by multiplying the derivatives of the individual functions.
Mathematical Definition
• If we have two functions:
• y=g(x)
• z=f(y) so that z=f(g(x)), •First, compute the derivative of g(x) (inner
function).
• Then the derivative of z with respect to x is: •Then, compute the derivative of f(y) (outer
function) at y=g(x).
•Multiply these derivatives to get dz\dx​.
Visualizing the Chain Rule

(A) Function Composition as a Computational Graph


Think of a function composition as a series of transformations, represented as a computational
graph.
• Example: f(g(x))
• First, the input x goes through function g to produce an intermediate result y.
• Then, y is passed into function f, which produces the final output z.
• x → [ g(x) ] → y → [ f(y) ] → z.

Each arrow represents a transformation. The derivative of the final output z with respect to x is
computed using the chain rule.
Geometric Intuition (Slope Interpretation)
In deep learning, neural networks consist of multiple layers, where each layer applies a function to
the output of the previous layer. Training a neural network requires computing the derivative of a loss
function with respect to each layer’s parameters using the chain rule.

Neural Network Perspective

A simple two-layer neural network can be written as:

Each function represents a layer transformation. To compute the gradient:


Example: Computing the Derivative of

import numpy as np

def square(x):
return np.power(x, 2)
def sigmoid(x):
return 1 / (1 + np.exp(-x))

def derivative(func, x, delta=0.001):


return (func(x + delta) - func(x - delta)) / (2 * delta)
x = np.array([1.0, 2.0, 3.0]) # Input values

# Step 1: Compute intermediate value y = square(x)


y = square(x)

# Step 2: Compute gradients


dy_dx = derivative(square, x) # g'(x) = 2x
dz_dy = derivative(sigmoid, y) # f'(y) = sigmoid(y) * (1 - sigmoid(y))
# Step 3: Apply Chain Rule
dz_dx = dz_dy * dy_dx # Multiply gradients

print("Gradient of sigmoid(square(x)) with respect to x:", dz_dx)

Generalizing the Chain Rule


Functions with Multiple Inputs

A function with multiple inputs takes two or more independent variables and maps them to an
output.

Mathematical Definition

A function with two inputs x and y can be written as:


f(x,y)=some operation on x and y

Example Functions with Two Inputs:

Addition: f(x,y)=x+y

Multiplication: f(x,y)=x * y

Weighted Sum (Common in Deep Learning): f(x,y)=w1x+w2y

Here, w1​and w2​are weights that control the influence of x and y.


Python Implementation of a Function with Two Inputs

Function with Addition

import numpy as np

def add_function(x: np.ndarray, y: np.ndarray) -> np.ndarray:


return x + y

x = np.array([2, 3])
y = np.array([4, 5])

print(add_function(x, y)) # Output: [6, 8]


Functions with Multiple Inputs

Weighted Sum (Linear Combination)


In deep learning, a neuron in a neural network takes multiple inputs, multiplies them by weights, and
sums them.

Mathematical Representation:
def weighted_sum(x: np.ndarray, w: np.ndarray) -> np.ndarray:
return np.dot(x, w)

x = np.array([1, 2, 3])
w = np.array([0.1, 0.2, 0.3])

print(weighted_sum(x, w)) # Output: 1*0.1 + 2*0.2 + 3*0.3 = 1.4


Backpropagation in Neural Networks
In deep learning, functions with multiple inputs appear in each neuron of a neural network.

Example: One Neuron in a Neural Network A neuron computes:


Functions with Multiple Vector Inputs

A function with multiple vector inputs takes two or more vectors as inputs and produces an output,
which can be a scalar, vector, or matrix.
Mathematical Representation
If x and y are input vectors, a function with multiple vector inputs can be written as:

f(x,y)=some operation on x and y

Example Functions with Two Vector Inputs:


Python Implementation of Functions with Multiple Vector Inputs
Example 1: Dot Product Function
The dot product computes a weighted sum of two vectors.

import numpy as np

def dot_product(x: np.ndarray, y: np.ndarray) -> np.ndarray:


return np.dot(x, y)

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(dot_product(x, y)) # Output: 1*4 + 2*5 + 3*6 = 32

Example 2: Element-wise Multiplication

def elementwise_multiply(x: np.ndarray, y: np.ndarray) -> np.ndarray:


return x * y

print(elementwise_multiply(x, y)) # Output: [4, 10, 18]


Matrix Functions with Multiple Vector Inputs

In deep learning, most functions operate on matrices, not just vectors.

Example: Matrix Multiplication If X is an input matrix and W is a weight matrix, then the
function:
f(X,W)=X⋅W performs a linear transformation.

Python Implementation

def matrix_multiply(X: np.ndarray, W: np.ndarray) -> np.ndarray:


return np.dot(X, W)

X = np.array([[1, 2], [3, 4]]) # Shape (2,2)


W = np.array([[0.5, 1], [1.5, -1]]) # Shape (2,2)

print(matrix_multiply(X, W))
Creation of New Features from Existing Features

In a dataset, each feature represents an attribute or characteristic of the data. However, sometimes the
raw features may not be enough to capture useful patterns. By transforming or combining existing
features, we can create new, more informative features that improve learning.

Example in Deep Learning

Suppose we have two features:


x1​= house size (square feet)
x2​= number of rooms
A new, more informative feature can be:

Rooms per square foot:


Example: Predicting House Prices

Suppose we are building a house price prediction model. The dataset contains:
Size (sq ft)
Number of rooms Creating New Features:
Age of the house
Distance to city center

houses = pd.DataFrame({
'Size': [1000, 2000, 1500],
'Rooms': [3, 5, 4],
'Age': [5, 20, 50],
'Distance': [2, 10, 5]
})

# Creating new features


houses['Rooms_per_sqft'] = houses['Rooms'] / houses['Size']
houses['Is_Old'] = (houses['Age'] > 30).astype(int)
houses['City_Proximity'] = 1 / houses['Distance']

print(houses)
Feature Creation in Deep Learning

Deep learning models automatically learn new features, but feature engineering can still help.

(A)Creating Features for Neural Networks

In deep learning, new features are often created through layers:


1. Convolutional Layers (CNNs) extract new spatial features from images.
2. Recurrent Layers (RNNs) capture temporal patterns from sequences.
3. Autoencoders create compressed representations of inputs.

Example: Creating New Features in a Neural Network


import torch.nn as nn

class FeatureExtractor(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 5) # Creates 5 new features from 10 inputs

def forward(self, x):


return self.fc(x)
Derivatives of Functions with Multiple Vector Inputs

For a function f(x,y) with two vector inputs x and y, we need to compute the partial derivatives with
respect to each input.
Derivatives help in:

• Gradient Descent: Updating model parameters to minimize loss.


• Backpropagation: Computing gradients efficiently in neural networks.
• Optimization: Using gradients to adjust weights.

Computational Graph with 2D Matrix Inputs

A computational graph is a directed graph where:


• Nodes represent mathematical operations or variables.
• Edges represent dependencies between operations.

Example (Scalar Computation):

If z=(x+y)2, the computational graph is:

x → [ + ] → a → [ ^2 ] → z
y→[+]

This helps track computations for derivatives (gradients).

You might also like