100% found this document useful (1 vote)
14 views47 pages

Artificial Intelligence

The document discusses the impact of Artificial Intelligence (AI) on business operations, highlighting its applications in various sectors such as customer experience, finance, healthcare, and more. It covers key AI techniques like supervised and unsupervised learning, including Linear Regression, PCA, and Neural Networks, along with their assumptions and applications. Additionally, it explains concepts like Logistic Regression, Support Vector Machines, and Recommendation Systems, emphasizing their roles in enhancing decision-making and operational efficiency.

Uploaded by

babyraptor121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
14 views47 pages

Artificial Intelligence

The document discusses the impact of Artificial Intelligence (AI) on business operations, highlighting its applications in various sectors such as customer experience, finance, healthcare, and more. It covers key AI techniques like supervised and unsupervised learning, including Linear Regression, PCA, and Neural Networks, along with their assumptions and applications. Additionally, it explains concepts like Logistic Regression, Support Vector Machines, and Recommendation Systems, emphasizing their roles in enhancing decision-making and operational efficiency.

Uploaded by

babyraptor121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

1) Artificial Intelligence (AI) in Business

Artificial Intelligence (AI) is revolutionizing business operations by enabling systems to mimic human
intelligence, learn from data, and make informed decisions. Businesses use AI to automate
processes, enhance customer experiences, improve decision-making, and drive innovation.

With the invention of CPUs, GPUs, and advanced cloud infrastructure, businesses can now process
vast amounts of data quickly and run complex AI models efficiently. This enables smarter decision-
making, personalized customer experiences, and improved operational performance.

AI techniques are commonly divided into supervised and unsupervised learning.

• Supervised learning methods include Linear Regression, Logistic Regression, Decision Trees,
Random Forest, Support Vector Machines (SVM), and Neural Networks.

• Unsupervised learning methods include K-Means Clustering, Hierarchical Clustering,


Principal Component Analysis (PCA), and Association Rule Mining.

These techniques are applied in various real-world business scenarios. Examples are,

1. Customer Experience & Marketing

• AI chatbots for real-time support

• Recommendation engines (Amazon, Netflix)

• Customer segmentation & sentiment analysis


2. Finance & Banking

• Fraud detection (e.g., PayPal)

• Credit scoring & risk assessment

• AI trading systems & chatbots (e.g., Erica – BoA)

3. Healthcare

• AI diagnostics (e.g., DeepMind for cancer detection)

• Medical image analysis

• Personalized medicine & outbreak prediction

4. Retail

• Dynamic pricing & demand forecasting

• Inventory automation

• Customer behavior analytics

5. Transportation

• Self-driving cars (Tesla, Waymo)

• Smart traffic & route optimization (Google Maps)

• Predictive maintenance

6. Agriculture

• Precision farming via drones/sensors

• Crop disease detection

• Automated harvesting

7. Education

• AI tutors (Duolingo, Coursera)

• Automated grading & cheating detection

• Research enhancement (e.g., protein design)

AI helps businesses gain competitive advantage by uncovering insights from big data, improving
efficiency, and supporting smarter, faster decision-making in today’s dynamic environment.
2) Linear Regression & Regularization

Linear Regression

Linear Regression is a supervised learning algorithm used to model the relationship between a
dependent variable (target) and one or more independent variables (features) by fitting a straight
line (linear equation) to the observed data.

Types:

1. Simple Linear Regression – One independent variable


Example: Predicting house price based on size (sq. ft.)

2. Multiple Linear Regression – More than one independent variable


Example: Predicting salary based on experience, education, and age

Assumptions:

• Linear relationship between variables

• No or little multicollinearity

• Homoscedasticity (equal variance of errors)

• Normally distributed errors

• Independence of observations
3) PCA

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while
preserving as much variance (information) in the data as possible. It transforms a large set of
variables into a smaller set of uncorrelated variables called principal components.

Each principal component is a linear combination of the original variables and is constructed in such
a way that:
• The first principal component captures the maximum variance in the data.
• The second principal component captures the next highest variance, and so on.
• All components are orthogonal (uncorrelated) to each other.

How PCA is Used for Dimensionality Reduction

PCA reduces dimensionality by:


• Identifying the directions (principal components) where the data varies the most.
• Projecting the original data onto the top few principal components that capture most of the
variability.
• Discarding components with low variance, which are assumed to carry less information/noise.

This helps in:


• Simplifying models
• Speeding up machine learning algorithms
• Reducing overfitting
• Visualizing high-dimensional data in 2D/3D

Steps Involved in Performing PCA on a Dataset

1. Standardize the Data:


o Normalize each feature to have mean = 0 and standard deviation = 1.
o This ensures all variables contribute equally to the analysis.

2. Compute the Covariance Matrix:


o Measure how variables vary with respect to each other.
o Covariance matrix size: n x n, where n is the number of features.

3. Compute the Eigenvalues and Eigenvectors:


o Eigenvectors represent the directions (principal components).
o Eigenvalues indicate the magnitude of variance in those directions.

4. Sort Eigenvectors by Eigenvalues:


o Arrange them in descending order of eigenvalues.
o This ranks the components by their importance (variance explained).
5. Select the Top k Principal Components:
o Choose k components that explain a desired amount of total variance (e.g., 95%).
o This reduces the number of features while retaining most information.

6. Transform the Original Dataset:


o Multiply the original standardized data by the selected eigenvectors.
o This gives the dataset in the new reduced-dimensional space.

Problem
? Need Help

4) Logistic Regression for Classification Using Softmax


Function

Logistic Regression is a supervised machine learning algorithm used for classification tasks. It predicts
the probability that a given input belongs to a particular class.

Logistic Regression Activation Functions

For binary classification, it uses the sigmoid function. For multiclass classification, it uses the softmax
function.
Softmax Function – Explained

The Softmax function is an activation function used in multiclass classification problems. It converts
a vector of raw scores (called logits) into a probability distribution — where each value is between 0
and 1, and all probabilities sum up to 1.

It is typically used in the output layer of models like multiclass logistic regression or neural networks
where an input is classified into one of several mutually exclusive classes.

Properties of Softmax

• Output range: (0, 1)

• Sum of all class probabilities = 1

• Converts arbitrary real values into a normalized probability distribution

• Numerically stable version: subtract max value from all logits before applying softmax
5) Bayes Classification

The Bayes Classifier is a supervised learning method based on Bayes’ Theorem, used to predict the
class label Y = Ci given a set of features X. It is widely used in applications like:

• Text classification

• Spam filtering

• Recommendation systems
PROBLEMs
6) Neural Network (NN), Deep Neural Network (DNN) &
Convolutional Neural Network (CNN)

Neural Network (NN)

A neural network is a computational model inspired by the human brain, consisting of


interconnected nodes (neurons) organized in layers. Each neuron receives input, processes it using
weights and biases, applies an activation function, and passes the result to the next layer. Neural
networks are capable of learning complex patterns from data through a process called training,
where weights and biases are adjusted to minimize prediction errors. They are widely used for tasks
like classification, regression, and pattern recognition.

Deep Neural Network (DNN)

A deep neural network is an extension of a basic neural network that contains multiple hidden layers
between the input and output layers. The increased depth allows DNNs to learn more complex and
abstract features from data. Each additional layer enables the network to build upon the
representations learned in previous layers, making DNNs especially powerful for tasks involving high-
dimensional data such as images, audio, and text. DNNs are the foundation of deep learning, a
subfield of machine learning focused on large, multi-layered neural networks.

Convolutional Neural Network (CNN)

A convolutional neural network is a specialized type of deep neural network designed primarily for
processing grid-like data, such as images. CNNs use convolutional layers that apply filters (kernels) to
local regions of the input, capturing spatial hierarchies and patterns (like edges, textures, and
shapes). Key components of CNNs include:
• Convolutional Layers: Extract local features using filters.

• Pooling Layers: Reduce spatial dimensions, making the network more efficient and robust to
small translations.

• Fully Connected Layers: Combine features for final classification or regression.

CNNs are highly effective for image and video recognition, object detection, and similar tasks due to
their ability to automatically learn spatial features from raw data.

A CNN filter (also called a kernel) is a small matrix of weights used in Convolutional Neural Networks
(CNNs) to detect features in an image, such as edges, textures, shapes, or patterns.

What Does a CNN Filter Do?

• It slides (or convolves) across the input image.

• At each position, it performs element-wise multiplication and sums the result to produce a
single value in the output feature map.

• This process helps extract important spatial features from the image.

Applications

Image Classification (e.g., ResNet, VGG, AlexNet)

Object Detection (e.g., YOLO, Faster R-CNN)

Facial Recognition (e.g., FaceNet) MRIScan Analysis


7) K Means Clustering
What is K-Means Clustering?

K-Means is an unsupervised machine learning algorithm used to group similar data points into K
clusters.

• It tries to minimize the distance between data points and their respective cluster centroids
(mean of points in that cluster).

• It is widely used for pattern recognition, customer segmentation, market basket analysis, etc.

How Does K-Means Work?

1. Choose the number of clusters (K).

2. Initialize centroids (either randomly or using techniques like KMeans++).

3. Assign points to the nearest centroid.

4. Recalculate the centroids as the mean of all points assigned to each cluster.

5. Repeat steps 3 & 4 until centroids don’t change much or a max number of iterations is
reached.

Assumptions of K-Means

1. Spherical clusters: Assumes clusters are spherical in shape.

2. Equal-sized clusters: Assumes clusters have roughly the same number of observations.

3. Mean is a good measure of center: Works best when the mean represents the center of the
data well.

4. Low-dimensional and continuous variables: Performs better when the features are numeric
and scaled.

5. No/few outliers: Outliers can skew centroids significantly.

Interpreting K-Means Clustering

1. Cluster Centroids

• These represent the "average profile" of the cluster.

• For example, in customer segmentation, the centroid tells you the average age, income, etc.,
of customers in that cluster.

2. Inertia (Within-Cluster Sum of Squares)

• A measure of how tightly packed the clusters are.


• Lower inertia = more compact clusters.

3. Elbow Method (to choose K)

• Plot number of clusters vs. inertia.

• Look for the "elbow point" where the rate of decrease sharply slows—this is usually the
optimal K.

4. Silhouette Score

• Measures how similar a point is to its own cluster compared to other clusters.

• Ranges from -1 to 1:

o Close to 1 → well clustered

o Close to 0 → overlapping clusters

o Negative → wrong clustering

5. Cluster Distribution

• Analyze the number of observations in each cluster.

• Imbalanced clusters might suggest poor K value or non-spherical data.


8) Support Vector Machine SVM

Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and
regression tasks. While it can handle regression problems, SVM is particularly well-suited for
classification tasks. SVM is also known as an optimal margin classifier

The goal of SVM is to find the optimal hyperplane that divides the classes with the maximum
margin.

Support Vector Machine (SVM) Terminology

• Hyperplane: A decision boundary separating different classes in feature space, represented


by the equation wx + b = 0 in linear classification.

• Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.

• Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance. Gap between decision boundary
and hyperplane. Wide margin = better generalization.

• Kernel: A function that maps data to a higher-dimensional space, enabling SVM to handle
non-linearly separable data.
• Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.

• Soft Margin: Allows some misclassifications by introducing slack variables, balancing margin
maximization and misclassification penalties when data is not perfectly separable.

• C: A regularization term balancing margin maximization and misclassification penalties. A


higher C value enforces a stricter penalty for misclassifications.

• Hinge Loss: A loss function penalizing misclassified points or margin violations, combined
with regularization in SVM.

• Dual Problem: Involves solving for Lagrange multipliers associated with support vectors,
facilitating the kernel trick and efficient computation.

SVM Assumptions

• Data is mostly separable, or nearly so (can use soft margin).

• Independent and identically distributed (i.i.d) data.

• Features are scaled, especially for RBF or polynomial kernels.

• Minimal noise and outliers, as they can affect margin significantly.


Types of SVM

1. Linear SVM

o Used when the data is linearly separable.

o Finds a straight-line (or flat) hyperplane.

2. Non-Linear SVM

o Used when the data is not linearly separable.

o Applies the kernel trick to project data into a higher dimension where it can be
separated linearly.

Kernel Trick

Kernels transform input data into higher-dimensional space to make it linearly separable.

Common Kernels:

Kernel Type Use Case

Linear Linearly separable data

Polynomial Data with polynomial boundaries

RBF (Gaussian) Default for non-linear and complex data

Sigmoid Similar to neural networks (less common)

9) Recommendation System
A Recommendation System (RS) is a subclass of information filtering systems designed to predict a
user’s interest in an item (e.g., product, movie, article). These systems are used across various
industries such as e-commerce, entertainment, and online services.

Purpose:

To suggest relevant items based on user preferences, past behaviour, and similar users/items.

To enhance user experience, improve customer retention, and increase sales or engagement.

2. Types of Recommendation Systems


2.1 Content-Based Filtering

Content-Based Filtering (CBF) is a technique that recommends items to a user by comparing the
features of items with the user's preferences. The core idea is to match user and item profiles based
on feature similarity.

Example:

• A user who watches action movies is recommended other action movies.

Pros:

• Personalized recommendations.

• No need for data from other users.

Cons:

• Limited diversity.

• Cold-start problem for new users or items.


2.2 Collaborative Filtering

• Definition: Recommends items based on past interactions of users and the preferences of
similar users on the basis of User behaviour data (ratings, clicks, purchases).

a. User-Based Collaborative Filtering

• Finds users with similar preferences and recommends what they liked.

b. Item-Based Collaborative Filtering

• Finds items that are similar based on user interactions.

Matrix Factorization-Discover hidden patterns based on latent features.

Example:

• "Users who bought this also bought..."

Pros:

• Learns from real user behaviour.

• Captures complex and nuanced tastes.

Cons:

• Suffers from cold-start for new users/items.

• Data sparsity in large datasets can reduce accuracy.


Item Based
Matrix Factorization
Interpretation of Results
2.3 Hybrid Recommendation Systems

• Definition: Combines two or more techniques (e.g., content-based + collaborative filtering).

• Approaches:

o Weighted: Assigns scores to each method and combines results.


o Switching: Switches between methods based on context.

o Mixed: Merges results from multiple methods.

Example:

• Netflix combines user preferences, watch history, and content metadata.

Pros:

• Balances individual weaknesses.

• Provides more accurate and flexible recommendations.

Cons:

• More complex to design and maintain.

10) Word2vec

Word2Vec is a shallow, two-layer neural network developed by Tomas Mikolov (Google) that learns
vector representations of words (word embeddings) from a large corpus of text. Unlike traditional
methods like one-hot encoding or bag-of-words, which produce high-dimensional, sparse vectors
with no sense of word relationships, Word2Vec learns dense, low-dimensional vectors that capture
semantic meaning. and is trained using self-supervised learning, generating training signals directly
from raw text through predictive tasks.

Word2Vec is all about creating embeddings for words from raw text using a shallow neural network

In Natural Language Processing (NLP), text refers to sequences of words, sentences, or documents in
human language — like English, Spanish, etc.
Embeddings are vector representations of words, sentences, or documents in a continuous, low-
dimensional space.

In simpler terms:

• Each word (or piece of text) gets mapped to a vector of real numbers.

• These vectors capture meaning, so similar words (like king and queen) will have vectors that
are close together in space.

It uses two main architectures—CBOW and Skip-Gram

• CBOW (Continuous Bag of Words): The CBOW model predicts the current word given
context words within a specific window. The input layer contains the context words and the
output layer contains the current word. The hidden layer contains the dimensions we want
to represent the current word present at the output layer.

Skip Gram : Skip gram predicts the surrounding context words within specific window given current
word. The input layer contains the current word and the output layer contains the context words.
The hidden layer contains the number of dimensions in which we want to represent current word
present at the input layer.
PROBLEMs ? Need Help

Pending
----------------
PCA problem
Word2vec problem
CNN +relu problem
Neural network handwritten digit problem

You might also like