Artificial Intelligence
Artificial Intelligence
Artificial Intelligence (AI) is revolutionizing business operations by enabling systems to mimic human
intelligence, learn from data, and make informed decisions. Businesses use AI to automate
processes, enhance customer experiences, improve decision-making, and drive innovation.
With the invention of CPUs, GPUs, and advanced cloud infrastructure, businesses can now process
vast amounts of data quickly and run complex AI models efficiently. This enables smarter decision-
making, personalized customer experiences, and improved operational performance.
• Supervised learning methods include Linear Regression, Logistic Regression, Decision Trees,
Random Forest, Support Vector Machines (SVM), and Neural Networks.
These techniques are applied in various real-world business scenarios. Examples are,
3. Healthcare
4. Retail
• Inventory automation
5. Transportation
• Predictive maintenance
6. Agriculture
• Automated harvesting
7. Education
AI helps businesses gain competitive advantage by uncovering insights from big data, improving
efficiency, and supporting smarter, faster decision-making in today’s dynamic environment.
2) Linear Regression & Regularization
Linear Regression
Linear Regression is a supervised learning algorithm used to model the relationship between a
dependent variable (target) and one or more independent variables (features) by fitting a straight
line (linear equation) to the observed data.
Types:
Assumptions:
• No or little multicollinearity
• Independence of observations
3) PCA
Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while
preserving as much variance (information) in the data as possible. It transforms a large set of
variables into a smaller set of uncorrelated variables called principal components.
Each principal component is a linear combination of the original variables and is constructed in such
a way that:
• The first principal component captures the maximum variance in the data.
• The second principal component captures the next highest variance, and so on.
• All components are orthogonal (uncorrelated) to each other.
Problem
? Need Help
Logistic Regression is a supervised machine learning algorithm used for classification tasks. It predicts
the probability that a given input belongs to a particular class.
For binary classification, it uses the sigmoid function. For multiclass classification, it uses the softmax
function.
Softmax Function – Explained
The Softmax function is an activation function used in multiclass classification problems. It converts
a vector of raw scores (called logits) into a probability distribution — where each value is between 0
and 1, and all probabilities sum up to 1.
It is typically used in the output layer of models like multiclass logistic regression or neural networks
where an input is classified into one of several mutually exclusive classes.
Properties of Softmax
• Numerically stable version: subtract max value from all logits before applying softmax
5) Bayes Classification
The Bayes Classifier is a supervised learning method based on Bayes’ Theorem, used to predict the
class label Y = Ci given a set of features X. It is widely used in applications like:
• Text classification
• Spam filtering
• Recommendation systems
PROBLEMs
6) Neural Network (NN), Deep Neural Network (DNN) &
Convolutional Neural Network (CNN)
A deep neural network is an extension of a basic neural network that contains multiple hidden layers
between the input and output layers. The increased depth allows DNNs to learn more complex and
abstract features from data. Each additional layer enables the network to build upon the
representations learned in previous layers, making DNNs especially powerful for tasks involving high-
dimensional data such as images, audio, and text. DNNs are the foundation of deep learning, a
subfield of machine learning focused on large, multi-layered neural networks.
A convolutional neural network is a specialized type of deep neural network designed primarily for
processing grid-like data, such as images. CNNs use convolutional layers that apply filters (kernels) to
local regions of the input, capturing spatial hierarchies and patterns (like edges, textures, and
shapes). Key components of CNNs include:
• Convolutional Layers: Extract local features using filters.
• Pooling Layers: Reduce spatial dimensions, making the network more efficient and robust to
small translations.
CNNs are highly effective for image and video recognition, object detection, and similar tasks due to
their ability to automatically learn spatial features from raw data.
A CNN filter (also called a kernel) is a small matrix of weights used in Convolutional Neural Networks
(CNNs) to detect features in an image, such as edges, textures, shapes, or patterns.
• At each position, it performs element-wise multiplication and sums the result to produce a
single value in the output feature map.
• This process helps extract important spatial features from the image.
Applications
K-Means is an unsupervised machine learning algorithm used to group similar data points into K
clusters.
• It tries to minimize the distance between data points and their respective cluster centroids
(mean of points in that cluster).
• It is widely used for pattern recognition, customer segmentation, market basket analysis, etc.
4. Recalculate the centroids as the mean of all points assigned to each cluster.
5. Repeat steps 3 & 4 until centroids don’t change much or a max number of iterations is
reached.
Assumptions of K-Means
2. Equal-sized clusters: Assumes clusters have roughly the same number of observations.
3. Mean is a good measure of center: Works best when the mean represents the center of the
data well.
4. Low-dimensional and continuous variables: Performs better when the features are numeric
and scaled.
1. Cluster Centroids
• For example, in customer segmentation, the centroid tells you the average age, income, etc.,
of customers in that cluster.
• Look for the "elbow point" where the rate of decrease sharply slows—this is usually the
optimal K.
4. Silhouette Score
• Measures how similar a point is to its own cluster compared to other clusters.
• Ranges from -1 to 1:
5. Cluster Distribution
Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and
regression tasks. While it can handle regression problems, SVM is particularly well-suited for
classification tasks. SVM is also known as an optimal margin classifier
The goal of SVM is to find the optimal hyperplane that divides the classes with the maximum
margin.
• Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.
• Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance. Gap between decision boundary
and hyperplane. Wide margin = better generalization.
• Kernel: A function that maps data to a higher-dimensional space, enabling SVM to handle
non-linearly separable data.
• Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.
• Soft Margin: Allows some misclassifications by introducing slack variables, balancing margin
maximization and misclassification penalties when data is not perfectly separable.
• Hinge Loss: A loss function penalizing misclassified points or margin violations, combined
with regularization in SVM.
• Dual Problem: Involves solving for Lagrange multipliers associated with support vectors,
facilitating the kernel trick and efficient computation.
SVM Assumptions
1. Linear SVM
2. Non-Linear SVM
o Applies the kernel trick to project data into a higher dimension where it can be
separated linearly.
Kernel Trick
Kernels transform input data into higher-dimensional space to make it linearly separable.
Common Kernels:
9) Recommendation System
A Recommendation System (RS) is a subclass of information filtering systems designed to predict a
user’s interest in an item (e.g., product, movie, article). These systems are used across various
industries such as e-commerce, entertainment, and online services.
Purpose:
To suggest relevant items based on user preferences, past behaviour, and similar users/items.
To enhance user experience, improve customer retention, and increase sales or engagement.
Content-Based Filtering (CBF) is a technique that recommends items to a user by comparing the
features of items with the user's preferences. The core idea is to match user and item profiles based
on feature similarity.
Example:
Pros:
• Personalized recommendations.
Cons:
• Limited diversity.
• Definition: Recommends items based on past interactions of users and the preferences of
similar users on the basis of User behaviour data (ratings, clicks, purchases).
• Finds users with similar preferences and recommends what they liked.
Example:
Pros:
Cons:
• Approaches:
Example:
Pros:
Cons:
10) Word2vec
Word2Vec is a shallow, two-layer neural network developed by Tomas Mikolov (Google) that learns
vector representations of words (word embeddings) from a large corpus of text. Unlike traditional
methods like one-hot encoding or bag-of-words, which produce high-dimensional, sparse vectors
with no sense of word relationships, Word2Vec learns dense, low-dimensional vectors that capture
semantic meaning. and is trained using self-supervised learning, generating training signals directly
from raw text through predictive tasks.
Word2Vec is all about creating embeddings for words from raw text using a shallow neural network
In Natural Language Processing (NLP), text refers to sequences of words, sentences, or documents in
human language — like English, Spanish, etc.
Embeddings are vector representations of words, sentences, or documents in a continuous, low-
dimensional space.
In simpler terms:
• Each word (or piece of text) gets mapped to a vector of real numbers.
• These vectors capture meaning, so similar words (like king and queen) will have vectors that
are close together in space.
• CBOW (Continuous Bag of Words): The CBOW model predicts the current word given
context words within a specific window. The input layer contains the context words and the
output layer contains the current word. The hidden layer contains the dimensions we want
to represent the current word present at the output layer.
Skip Gram : Skip gram predicts the surrounding context words within specific window given current
word. The input layer contains the current word and the output layer contains the context words.
The hidden layer contains the number of dimensions in which we want to represent current word
present at the input layer.
PROBLEMs ? Need Help
Pending
----------------
PCA problem
Word2vec problem
CNN +relu problem
Neural network handwritten digit problem