0% found this document useful (0 votes)
11 views39 pages

LFD 1

The document provides an overview of concepts in artificial intelligence, machine learning, and deep learning, detailing their definitions, goals, and relationships. It explains various machine learning families, including supervised, unsupervised, reinforcement, and deep learning, along with their algorithms and applications. Additionally, it discusses model training, evaluation, and techniques for handling data, including hyperparameter tuning and neural network architectures.

Uploaded by

manmmma61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

LFD 1

The document provides an overview of concepts in artificial intelligence, machine learning, and deep learning, detailing their definitions, goals, and relationships. It explains various machine learning families, including supervised, unsupervised, reinforcement, and deep learning, along with their algorithms and applications. Additionally, it discusses model training, evaluation, and techniques for handling data, including hyperparameter tuning and neural network architectures.

Uploaded by

manmmma61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Concepts of sections

(LFD)
Data Science

Artificial Intelligence (AI):

 The overarching umbrella: AI refers to the broad concept of creating intelligent


machines that can think, learn, and perform tasks that typically require human
intelligence.

 Goals: Achieve human-like intelligence, understand and solve complex problems, adapt
to new situations.

Machine Learning (ML):

 A method within AI: ML focuses on the ability of machines to learn from data without
explicit programming.

 Goals: Identify patterns, make predictions, improve performance over time with more
data.

Deep Learning (DL):

 A subfield of ML: DL uses artificial neural networks inspired by the human brain to learn
from large amounts of data, particularly unstructured data like images and text.

 Goals: Solve complex problems requiring pattern recognition, feature extraction, and
representation.
Analogies:
The Language Learning Analogy:

 AI: Learning a new language in general, the ability to communicate and understand
information in a different way.

 ML: Learning vocabulary and basic grammar, acquiring the building blocks of language
through repeated exposure and practice.

 DL: Mastering advanced language skills like comprehension of complex


nuances, creative expression, and adapting to different contexts. Like learning a foreign
language fluently.

The Map Analogy:

 AI: Finding your way in a city, the overall ability to navigate and reach your destination.

 ML: Using a basic map with directions, a straightforward approach to getting from point
A to point B.

 DL: Using a real-time GPS navigation system, constantly adapting to traffic


conditions, detours, and dynamic situations to optimize your route.

Machine
Learning
Families
Supervised Unsupervised
Learning Learning

Reinforcement Deep
Learning Learning
1. Supervised Learning: In this family of algorithms, the models are trained on (labeled
data), where the (output is known), to learn the relationship between the input features
and the output variable.
o Examples:

Classification: Predicting a category (e.g., spam vs. not spam, image recognition).

Regression: Predicting a continuous value (e.g., house prices, stock prices).

o Common algorithms:

linear regression

logistic regression

support vector machines (SVM)

Neural Networks

2. Unsupervised Learning: In this family of algorithms, the models are trained on


(unlabeled data), where the (output is unknown), to identify patterns and structures in
the data.
o Examples:
 Clustering: Grouping similar data points together (e.g., customer
segmentation).
o Common algorithms:

 clustering (K-Means Clustering, Hierarchical Clustering)


3. Reinforcement Learning: make decisions based on the feedback they receive from their
environment (based on rewards or penalties through trial and error)

4. Deep Learning: This family of algorithms is a (subset of machine learning) that uses (deep
neural networks) to model (complex relationships between inputs and outputs). Deep
learning has achieved state-of-the-art results in various domains such as image
classification, speech recognition, and natural language processing.

5. Other Families:
Data, Training, Prediction, Actual value

Tabular Data (SQL)

Data is written in scv file (with scv extention)

Product x1 X2 X3

ID color quality version price


1 Red B 4 22
2 Green A 3 32
3 Blue C 1 14
4 Red A 5 32
5 Blue C 3 26
6 Blue A 2 23

x  Features of the product

Y  Actual value - Target

F(X)  prediction
Machine Learning Cycle

Data Exploration -- > report -- > by install pandas-profiling (Python library)

Drop ?

Uniformaly Missing
Not effect in
distributed variables
prediction
variables

more than 50% or a predefined


threshold
Handle these missing values:

1) Numerical Variables  Mean


2) Categorical Variables  Mode

Outliers : remove the row that have outliers from the dataset

Data
Preprocessing

Numrical Categorical
Variables Variables

Ordinal Variables Nominal Variables


Normalization Standardization

Min - Max Z - score Label Encoding One Hot Encoding


Label Encoding One Hot Encoding
-fff
A+ 1
Cotton Polyester Melton
C 6 Material Material Material Material
B+ 3 cotton 1 0 0
A 2 polyester 0 1 0
B 4 melton 0 0 1

Transform features into Normal/Gaussian Distribution

1) Log Transformation
2) Square root Transformation
3) Reciprocal Transformation
4) Exponential Transformation
5) Box-Cox Transformation
Modeling
The model function for linear regression (algorithm)

linear regression make mapping or linear relationship between variables and target variable

use initial w and b

f(X(i)) = w * x(i) + b f(X(i))  Prediction of specific row

i  number of row

x  The feature

w  weight – slop

b  Intercept - bias

ID version price
1 4 32
2 3 22

f (X(1)) = w * x[X(1)] + b

f (X(2)) = w * x[X(2)] + b

Product w1 w2 w3

ID box quality version price


1 1 3 4 22
2 0 1 3 32
3 1 2 1 14

f (X) = w[1] * x[1] + w[2] * x[2] + w[3] * x[3] + b  prediction of first row
Red line  prediction

Dots  Actual values

Mean Squared Error (MSE) - Cost Function

Y hat  f(X)
Gradient descent
Change values of w and b to decrease the cost function by enhance the prediction F(X)
Overfitting and Underfitting

underfitting can happening when learning rate is big or dataset is small

usually overfitting is happening in training data (big dataset / many features --> high
complexity), then we use ways to generaliztion the model:

1) Collect more data


2) Make early stopping
3) Regularization
It's good to increase features in the model to increase accuracy but we should make
regularization to avoid overfitting

1) Collect more data

2) Make early stopping


3) Regularization
minimize or maximize more small part

lambda value from 0 to 0.1


Logistic Regression
Linear Regression (mapping or linear relationship between variables and target variable)

In Real Life models is complex more than this (Non linear relationship)

Logistic Regression used for classification and prediction analysis

Logistic Regression capture non-linear relationships between the variables and make
predictions for event occurring by binary outcomes (two possible values e.g. Yes or No, 1 or 0)
, the output is a probability

our scope with binary classification (0 or 1) like titanic multi classification

Logistic Regression uses Activation Function (sigmoid function/Relu function) to classification


the output (output between 0 and 1)

Activation function is a (more general concept) to add non-linearity (transforming the linear
relationship to a non linear relationship)

x is output of linear regression function


Sigmoid function is a general term that refers to a type of mathematical function that
produces an ( S-shaped curve )

logistic function is a specific type of sigmoid function (Activation Function)

Sigmoid function often called logistic function

Relu is most common activation function in neural networks

Probability threshold (usually 0.5 or higher) to make a binary decision about the outcome 0 or
1 (can use it in classification)
SVM
Support Vector Machine (algorithm) used in classification problems where the goal is to
divide the data points into two or more categories based on their features, by find a
hyperplane (a boundary that divides the data) that separates the different classes

Hyperplane make like margine and try to push it out to increase the clean boundary.

Will increase dimension/features on data (but it maybe make (cursive dimensionality) mean
overfitting because we increase dimensions/features
Here we can’t separate data by line, so will increase the dimensions and separate it by plane
 Linearly Separable Data:

Data is linearly separable if it is possible to draw by linear manner to completely separates


the different classes, like:

1) straight line (in two dimensions/features)


2) plane (in three dimensions/features)
3) hyperplane (in more than three dimensions/features)

For example, in a binary classification problem, data has two features or more, linearly
separable data might look like two distinct clusters on a scatter plot with a clear gap
between them.

 Non-linearly Separable Data:

Data is non-linearly separable if a straight line, plane, or hyperplane can’t completely


separate the different classes in the original feature space.

non-linear decision boundary is needed to capture the complex relationships between


features and classes by (Support Vector Machines (SVMs))

In many real-world scenarios, data is Non-linearly Separable

Support Vector Machine (usually used in classification), can used as regressor but it isn't
common

Support Vector Machines (SVMs) are often used in image classification problems, where the
goal is to classify images into one of several pre-defined categories.

SVMs can be used for both binary and multi-class classification problems in image analysis.

How SVM do this classification?

 By equation (Kernel) to increase dimensions without make problems like increase time to
learn (by dot product)
 kernel function determines how the input data (features) is transformed from dimension
into a higher-dimensional space, so in non-linear separable the data might become linearly
separable (it effective in capturing complex relationships)
Common types of kernels function:

1. Linear Kernel:
 linear decision boundary (two classes)

2. Polynomial Kernel:
 non-linear decision boundary (fit well) on datasets with non-linearly separable
classes, have Parameters c and d control the shift and degree of the polynomial

3. Radial Basis Function (RBF) Kernel:


 Also known as the Gaussian kernel, it is effective for capturing complex
relationships in the data by transforming it into a higher-dimensional space.

 RBF measures the similarity or distance between data points in a higher-


dimensional space without calculating the transformed features, captures the
relationships between data points in a way that is effective for handling non-linear
separations

 Have parameters for underfitting and overfitting (act as Regularization parameter)

 commonly used due to its ability to model complex decision boundaries by


finding the optimal hyperparameters

Problem: model can be biased or misleading of some class

Parameters of RBF:
1) C
 mean how much the SVM tries to avoid misclassifying the training data
 larger C means a higher penalty for misclassification, so more complex decision
boundary (fits the data better and make better accuracy)
 smaller C means a lower penalty for misclassification, so simpler decision boundary
(generalizes better but less accuracy)
2) gamma
 mean shape of the decision boundary, how much influence each training example
has on the decision boundary
 larger gamma means a narrower RBF kernel (each example has a small region of
influence), can capture the local shape of the data because decision boundary be
more sensitive to individual data points (more complex boundary), so will lead to a
more accurate model on training data but could result in overfitting
 smaller gamma means a wider RBF kernel (each example has a large region of
influence), can capture the global shape of the data (could underfit the data)

Hyperparameter Tuning  will try many times to identify best c & gamma for the model

Accuracy not the only way that we used to metric the classification, and worse one of them

Common techniques for handling non-linearly separable data include:


1) Kernelized Models: Using non-linear kernels
2) Non-linear Classifiers: Using algorithms as (decision trees, random forests, k-nearest
neighbors, or deep neural networks)
Hyperparameter Tuning
Used to tune these hyperparameters for better model performance, and reduce the risk of
overfitting

C: [0.1, 1, 5, 10, 100] gamma: [0.01, 0.1, 0.2, 1, 10]

Techniques for hyperparameter tuning:

1) cross-validation

2) Grid search

3) random search

4) Bayesian optimization

They can be implemented using different functions or algorithms

Cross-Validation
1) split training data into multiple subsets (training set and testing data)
2) The model is trained on some subset and validated on the remaining ones
3) This process is repeated multiple times (folds)
4) Performance metrics are Averaged across the folds (Average for best c & gamma for
each fold)
Variable Names:
1) Camel case:

For example  myVariableName

2) Pascal case:

For example  MyVariableName

3) Snake case:

For example  my_variable_name


Neural Networks (NN)
A neural network is a computational model, It consists of interconnected artificial neurons (or
nodes) organized in layers.

Neural networks are used for various tasks, including pattern recognition, classification,
regression, and other machine learning tasks

Applications of neural networks, such as image recognition, natural language processing, and
autonomous vehicles

The depth and complexity of neural networks can vary, ranging from simple models like the
perceptron to more complex architectures like convolutional neural networks (CNNs) and
recurrent neural networks (RNNs)

Perceptron is the simplest form (architecture) of a neural network (the foundational concept
in neural networks)

perceptron is a type of neuron

perceptron takes multiple binary inputs

Neuron takes multiple input signals (numerical values that represent various features or
aspects of the input data)

Neural Networks typically consisting of a single layer with one or more artificial neurons. It
takes input values, applies weights, adds a bias, and outputs a binary value (0 or 1) based on an
activation function

perceptron is the basic building block, and by connecting perceptrons in layers, we form neural
networks
Artificial neurons:
also known as perceptrons, are the basic computational units in a neural network, involves
taking inputs, applying weights to those inputs, summing them up, adding a bias, and then
passing the result through an activation function to produce an output (if classification will
apply the activation function)

The connections (edges) between neurons in adjacent layers are formed by the weights.

Each neuron in one layer is connected to every neuron in the next layer.

This layered structure, known as a feedforward neural network

Weights determine the strength and sign (positive or negative) of the connection between two
neurons.

Larger weights amplify the influence of the input on the connected neuron, while smaller
weights reduce it

Artificial Neurons (perceptrons)

and Layered Connection


Artificial neurons (perceptrons) are the basic building blocks of a neural network.

Types of Neural Network Architectures:


Types of NN:

1) Feedforward Neural Networks (FNNs)


- Basic neural network architecture where information travels in one direction: from the
input layer to the output layer.
- Layers include an input layer, one or more hidden layers, and an output layer.
- Used for tasks like classification and regression (supervised learning)
- used in a variety of applications where the input data can be mapped to an output
without the need for explicit memory of past inputs

2) Radial Basis Function Networks (RBFNs):


- Consists of input, hidden, and output layers.
- Uses radial basis functions as activation functions in the hidden layer.
- Often used for function approximation and pattern recognition.

- used when dealing with problems that involve non-linear mappings or complex data
Relationships

3) convolutional neural networks (CNNs)


4) recurrent neural networks (RNNs)
5) deep belief networks (DBNs)

Explicit memory of past inputs: ability of a model to retain and utilize information
(historical information encountered during the training process) from past inputs during its
decision-making process, It involves the ability of the model to learn and remember patterns,
dependencies, or relationships in the training data, which may include sequences or temporal
patterns.

Radial basis functions: mathematical function used both as an activation function in neural
networks (specifically RBF networks) and as a kernel function in support vector machines for
non-linear mapping of input data. It is especially useful when dealing with problems that
involve non-linear relationships
Neurons are connected in layers (structure of a neural network):

1) Input layer : Neurons in this layer receive the initial input data, each neuron has its own
weights
2) One or more hidden layers : process the input data through interconnected artificial
neurons, applying weights and biases and passing the results through activation
functions, hidden layers allows the network to capture intricate relationships and non-
linear patterns that may exist in the input data
3) Output layer : Neurons in this layer produce the final output of the network

The connections between neurons have associated weights that are adjusted during the
training process

Hidden layers contribute to the learning process (leading to improved


performance) by:
Non-Linearity

Feature Extraction

Hierarchical Learning

Representation Learning

Enhanced Expressiveness

Generalization

Adaptability to Task Complexity


N. neurons in input layer depend on n. features in the model (dataset), enters by Linear
regression equation (WX+b)

N. hidden layer depend of complexity or size of data

Hidden layer make non-linear relationship between input and output and get more complex
relationship between input and output (hidden layer act as feature engineering)

Each neuron should be in all neurons in next layer

Relu (rectified linear unit) function used between input layer and hidden layer , and between
hidden layer and output layer, Sigmoid function used in output layer

Can use Softmax Activation Function in output layer with multi-class classification
problems, It takes a vector of raw scores (logits) as input and converts them into probabilities,
so the output values are in the range (0, 1), and the sum of the probabilities across all classes is
equal to 1

The training process of a neural network involves both feedforward and backpropagation
steps

Feedforward: Information flows in one direction, from the input layer through one or more
hidden layers to the output layer (Primarily used for making predictions or classifications)

Backpropagation: algorithm of supervised learning used to train neural networks by


minimizing the error between predicted and actual output and improving its ability to make
accurate predictions, Involves both forward and backward passes, The forward pass generates
predictions, while the backward pass backward through the network using the chain rule,
adjusting weights and biases based on the error

The chain rule: calculate the gradients of the error with respect to the parameters of the
network during the training process when the error is propagated backward through the
network
Single Layer Perceptron (SLP)
the simplest form of a neural network, It consists of an input layer and an output layer only,
with no hidden layers, Neurons in the input layer represent the features of the input data,
Neurons in the output layer produce the network's output, used with linearly separable
problems (2 classes)

Multi-Layer Perceptrons (MLPs)


MLP is a type of neural network that consists of multiple layers of neurons, including an input
layer, one or more hidden layers, and an output layer. Each neuron in one layer is connected
to every neuron in the next layer. MLPs can learn complex patterns and relationships in data
due to the presence of hidden layers, used with non-linearly Mapping (more than 2 classes) ,
Complex Pattern Recognition, Diverse Applications, Tabular Data and Feature Engineering,
Large and Diverse Datasets

The training process involves adjusting the weights and biases of the perceptron to correctly
classify input data (by backpropagation algorithm)

Backpropagation Algorithm (optimizer)


Process where the error between the predicted output and the actual output is used to
optimize the weights and biases of neural networks (neurons). The goal is to minimize the
error

Optimization algorithms:
1) Gradient descent (linear regression)
2) Backpropagation (neural networks)
Adam optimizer (most used optimization algorithm used in deep learning models)
Usually overfitting is happening when use many neurons (many features) because will make
high complexity, but it's good to increase features or neurons in the network to increase
accuracy but we should make regularization to avoid overfitting

Types of regularization in NN:


1) Batch normalization
2) Dropout
3) L2 regularization by Keras

4 Thing we can monitor on them (training accuracy, traning loss, validation accuracy,
validation loss)

(Accuracy) not interested if prediction was 0.8 or 1 or 0.6 (accuracy just interested when
prediction be same actual value)

(Loss) interested if prediction was 0.8 or 1 or 0.6

Loss Functions
Also known as cost functions or objective functions, like MSE in linear regression (will penalize
the model based on the probilities from output)

Measuring the difference between predicted and actual values

The choice of the type of loss function depends on the type of problem being solved (e.g.,
regression or classification)
Common types of loss functions and their roles:
1) Mean Squared Error (MSE)  commonly used for regression tasks
2) Mean Absolute Error (MAE)  commonly used for regression tasks
3) Binary Cross-Entropy Loss (Log Loss)  Commonly used for binary classification
problems
4) Categorical Cross-Entropy Loss  Used for multi-class classification problems
5) Hinge Loss  Commonly used for support vector machines (SVM) and some types of
binary classification tasks
6) Huber Loss  A combination of MSE and MAE, Huber loss is less sensitive to outliers
than MSE, It is often used in regression tasks where the presence of outliers may affect
model performance

Loss function(binary cross-entropy loss) is commonly used in (binary classification problems)


with a (sigmoid activation function) in the (output layer) of the neural network

like when p= 0.8 ~ 1 and true label is 1, but still have loss in 0.2 (will penalize the model on
this 0.2 to decrease this loss in future), by adjusting its weights and biases to improve its
predictions (minimizing the binary cross-entropy loss)

Machine Learning Frameworks


(sklearn) used to make linear regression and logestic regression.

Neural Networks model or Deep learning can use:

1) Tensorflow (belong to google)

very fast - low user interface (should implement everything with yourself, not friendly)

2) Keras (based on tensorflow, belong to google)


not very fast – friendly interface (GUI better than tensorflow)
3) Pytorch (belong to meta or facebook, balanced between tensorflow and keras)
good speed (like tensorflow) - good GUI (not as keras but better than tensorflow)

Pytorch used now in many applications of AI like chatGPT, pytorch is framework has set of
classes, each class have its behavior, we inherent some classes and take its functions
Confusion Matrix
Confusion matrix is a tool in machine learning, allowing developers and data
scientists to evaluate and improve the performance of classification models by
providing valuable metrics such as accuracy, precision, recall, f1-score, and more

Confusion matrix is a table that summarizes the model's prediction results on a


validation dataset

Can help to identify the types of errors the model is making, such as false
positives, false negatives, true positives, and true negatives
Classification Report

- Precision is sensitive to the number of false positives and is useful when the cost of false
positives is high, A high precision indicates that the model has a low rate of false positives

- Recall is sensitive to the number of false negatives and is useful when the cost of missing
positive instances is high

- Support represents the number of actual occurrences of the class in the specified dataset
Explainability
Ability to understand and interpret the decisions and predictions made by
a machine learning model

shap values:
Directly associated with the feature domain, and they provide insights
into the impact of each feature on the model predictions

You might also like