Artificial Intelligence Applications
Artificial Intelligence Applications
UNIT 1
UNIT 2
UNIT 1
1 / 67
Artificial Intelligence Applications
(Entirely AI generated as a reminder of basics. Just google whatever you think you might need help
with, or need to understand better.)
Autonomy: Intelligent agents operate without direct human intervention, making their own
decisions based on their perceptions.
Reactivity: They respond to changes in their environment, adapting their behavior accordingly.
Proactiveness: Intelligent agents can take the initiative to fulfill their objectives, rather than
merely reacting to external stimuli.
Social Ability: Some agents can communicate and collaborate with other agents to achieve their
goals.
1. Simple Reflex Agents: Operate based on the current perception, following predefined rules.
2. Model-Based Reflex Agents: Maintain an internal state to track the world’s state based on past
actions and perceptions.
3. Goal-Based Agents: Consider future actions to achieve specific goals.
4. Utility-Based Agents: Choose actions based on the expected utility to maximize satisfaction.
5. Learning Agents: Improve their performance based on experience.
2. Problem Solving
Problem solving in artificial intelligence involves finding solutions to specific challenges using various
strategies. It encompasses the following aspects:
Problem Formulation
Problem formulation involves defining the problem in a structured way to facilitate the search for
solutions. Key components include:
2 / 67
Artificial Intelligence Applications
1. Initial State: The starting point of the problem.
2. Goal State: The desired end condition that indicates the problem is solved.
3. Actions: The set of operations that can be performed to move from one state to another.
4. State Space: The complete set of possible states generated by the actions.
Heuristics
Heuristics are strategies or techniques that help to guide the search process by estimating how close
a state is to the goal. They can significantly reduce the search space and time. Key aspects include:
Heuristic Function (h(n)): A function that estimates the cost of the cheapest path from node (n)
to the goal.
Admissible Heuristic: A heuristic that never overestimates the cost to reach the goal.
Consistent Heuristic: A heuristic that satisfies the triangle inequality, ensuring that the cost to
reach the goal from (n) is less than or equal to the cost to reach (m) plus the cost from (m) to the
goal.
1. A* Search Algorithm:
Combines the benefits of both BFS and heuristics by using the formula:
where:
(g(n)): the cost to reach node (n).
(h(n)): the estimated cost from (n) to the goal.
2. Greedy Best-First Search:
Selects the node that appears to be closest to the goal based solely on the heuristic
function (h(n)).
Constraint Satisfaction
Constraint satisfaction problems (CSPs) involve finding values for variables under specific
constraints. Key concepts include:
CSP Techniques:
Backtracking Search: A depth-first search algorithm that systematically searches for a solution
by exploring variable assignments and backtracking when constraints are violated.
Forward Checking: A technique that reduces the search space by eliminating inconsistent
values from the domains of unassigned variables after each assignment.
4 / 67
Artificial Intelligence Applications
State space formulation involves representing the problem as a graph where:
State space representation is crucial for understanding the structure of the problem and facilitating
effective search strategies.
Strategy: Explores all nodes at the present depth before moving to the next level.
Space Complexity: O(b^d), which can be impractical for large depth.
Completeness: Complete; guarantees the shortest path in unweighted graphs.
Iterative Deepening
Iterative Deepening Depth-First Search (IDDFS) combines the benefits of DFS and BFS. It involves
performing repeated depth-limited searches with increasing depth limits until the goal is found. This
method is particularly useful when:
Summary
Unit I provides a foundational understanding of intelligent agents and the various problem-solving
strategies in artificial intelligence. It covers the formulation of problems, search strategies, and the
characteristics of different algorithms. These concepts are essential for building intelligent systems
capable of effectively solving complex challenges in real-world applications.
5 / 67
Artificial Intelligence Applications
UNIT 2
Artificial Intelligence Applications: Basic Concepts
Definition of Learning Systems
A learning system in artificial intelligence (AI) refers to a system that can automatically improve its
performance at a given task over time through experience. It is a fundamental concept that drives the
development of intelligent systems. Learning systems aim to adapt and enhance their behavior based
on data, improving their decision-making abilities without being explicitly programmed to do so for
every new situation.
1. Data (Experience):
Learning systems rely on data, which can be seen as the 'experience' from which they draw
conclusions.
This data may come in various forms, such as:
Supervised Data: Input-output pairs that allow the system to learn from correct
answers.
Unsupervised Data: Unlabeled data where the system must find patterns.
Reinforcement Data: Data that comes in the form of feedback or rewards after
actions taken by the system.
2. Knowledge Base:
A system’s existing knowledge or rules which it uses to make sense of new data. This might
include predefined algorithms or models that serve as a starting point.
3. Inference Mechanism:
This is the process or model that takes the data and produces conclusions or predictions. It
could include:
Machine Learning Models: Algorithms such as decision trees, neural networks, etc.
Reasoning Mechanisms: Logical inference used to apply learned knowledge to new
situations.
4. Feedback Mechanism:
Learning systems often improve based on feedback from their performance. This feedback
helps the system to refine its internal parameters and models.
5. Performance Measure:
The metric by which the system evaluates how well it is learning or improving. This could
be:
Accuracy: Correctness of the predictions or decisions made.
6 / 67
Artificial Intelligence Applications
Speed: How fast the system processes the data.
Generalization: The system's ability to apply learned knowledge to new, unseen data.
Where:
8 / 67
Artificial Intelligence Applications
Natural Language Processing (NLP): Language models like GPT or BERT use deep learning
to generate and understand human language.
In summary, learning systems are the backbone of artificial intelligence, allowing machines to adapt
and perform tasks with increasing efficiency based on data and feedback. The design and
implementation of such systems depend on the type of learning (supervised, unsupervised,
reinforcement, etc.) and the specific task at hand.
Today, companies are using Machine Learning to improve business decisions, increase productivity,
detect disease, forecast weather, and do many more things. With the exponential growth of
technology, we not only need better tools to understand the data we currently have, but we also need
to prepare ourselves for the data we will have. To achieve this goal we need to build intelligent
machines. We can write a program to do simple things. But most of the time, Hardwiring Intelligence
in it is difficult. The best way to do it is to have some way for machines to learn things themselves. A
mechanism for learning – if a machine can learn from input then it does the hard work for us. This is
where Machine Learning comes into action. Some of the most common examples are:
Image Recognition
Speech Recognition
Recommender Systems
Fraud Detection
Self Driving Cars
Medical Diagnosis
Stock Market Trading
Virtual Try On
9 / 67
Artificial Intelligence Applications
2. Improvement through Experience:
Machine learning systems are designed to improve their performance over time. As more
data is provided, the models learn to generalize better and make more accurate predictions
or decisions.
3. Generalization:
A key goal is to create models that generalize well to unseen data. The system should not
just perform well on training data but should also be able to handle new data in real-world
scenarios. This is critical for ensuring robustness.
4. Prediction and Forecasting:
Machine learning models are often used for predictive analytics. For example, predicting
future trends in stock markets, customer behavior, or disease outbreaks. By learning from
historical data, the system can predict future outcomes with a reasonable level of accuracy.
5. Classification and Regression:
A common goal is to solve classification and regression problems:
Classification: Assigning data points to predefined categories (e.g., spam or not
spam).
Regression: Predicting a continuous value based on input data (e.g., house prices
based on features).
6. Clustering and Pattern Recognition:
Machine learning systems aim to detect patterns and group similar data points together
through clustering. This is useful in many fields, such as customer segmentation,
bioinformatics, etc.
7. Optimization:
Machine learning also focuses on optimization tasks, where the system learns to find the
best solution for a given problem. For example, route optimization for logistics companies.
8. Real-Time Learning:
Systems should be able to adapt in real-time as they are exposed to new data. This is
important in fields like autonomous driving, where decisions must be made instantly
based on the current environment.
9. Human-like Learning:
Another goal is to create models that mimic human cognitive abilities. This includes
learning from small amounts of data, understanding context, and transferring knowledge
from one domain to another (transfer learning).
10 / 67
Artificial Intelligence Applications
1. Healthcare:
2. Autonomous Systems:
Self-Driving Cars:
Autonomous vehicles rely on machine learning to understand their environment, make
driving decisions, and navigate roads safely.
Reinforcement learning algorithms help in real-time decision-making based on sensor data.
Drones:
Drones use ML algorithms for tasks such as object detection, path planning, and
autonomous navigation.
Language Translation:
Machine learning is behind services like Google Translate, enabling real-time translation
between different languages.
Sentiment Analysis:
Businesses use ML to analyze customer reviews, social media posts, and feedback to
gauge public sentiment.
Speech Recognition:
Voice assistants like Alexa, Siri, and Google Assistant use machine learning for speech
recognition, understanding natural language queries.
Fraud Detection:
11 / 67
Artificial Intelligence Applications
Machine learning models are used to detect fraudulent transactions in real time by
analyzing patterns in transaction data.
Credit Scoring:
Banks and financial institutions use machine learning to assess the creditworthiness of
individuals based on their financial histories.
Algorithmic Trading:
ML algorithms analyze vast amounts of market data to make buy/sell decisions in real-time
to maximize profits in stock trading.
Recommendation Systems:
Retail platforms like Amazon and Netflix use recommendation engines to suggest products,
movies, or music to users based on their past behavior.
Dynamic Pricing:
ML algorithms dynamically adjust product prices based on demand, competitor prices, and
customer behavior.
Inventory Management:
Machine learning helps optimize inventory levels by predicting demand and automatically
reordering stock.
6. Manufacturing:
Predictive Maintenance:
Machine learning models are used to predict when machines or equipment will fail, enabling
preemptive maintenance and reducing downtime.
Quality Control:
Image recognition models are used to detect defects in products during the manufacturing
process, ensuring high-quality output.
7. Cybersecurity:
Threat Detection:
Machine learning algorithms monitor network traffic to detect suspicious activities and
potential security breaches.
Spam Filtering:
Email systems use ML to filter out spam and malicious content by learning patterns
associated with such emails.
12 / 67
Artificial Intelligence Applications
Personalized Marketing:
Machine learning is used to analyze customer behavior, enabling businesses to tailor
advertisements to individuals’ preferences.
Customer Segmentation:
ML helps in identifying groups of customers with similar behaviors, allowing targeted
marketing strategies.
9. Agriculture:
Traffic Management:
Machine learning helps optimize traffic flow by predicting congestion patterns and adjusting
traffic signals in real time.
Energy Management:
ML models predict energy demand and optimize energy usage across cities, reducing
wastage and improving efficiency.
Summary
Machine learning is fundamentally changing how we approach problem-solving across a wide range of
industries. Its goals include automating tasks, improving performance through experience,
generalization, and real-time learning. ML applications are widespread, from healthcare and
autonomous systems to finance, retail, and even agriculture. By enabling systems to learn from data,
machine learning continues to unlock new possibilities and efficiencies across sectors.
https://www.geeksforgeeks.org/design-a-learning-system-in-machine-
learning/ <- here
Training Data
Training data is a critical aspect of developing any learning system in machine learning (ML). The
quality and nature of training data directly influence how well the model learns and how effectively it
performs in real-world applications.
15 / 67
Artificial Intelligence Applications
Normalization: Scaling data values so that they fall within a standard range (e.g., 0 to
1) to avoid biases in model training.
Handling Missing Data: Filling in or removing missing values to ensure data
consistency.
Data Augmentation: In tasks like image recognition, data augmentation (e.g., flipping,
rotation, scaling) helps create a more diverse dataset and improve generalization.
5. Data Quality
High-quality training data is essential for effective learning. Poor quality data can lead to
inaccurate models. Some key dimensions of data quality include:
Accuracy: The data should accurately represent the real-world scenario.
Completeness: The data should not have large gaps, missing values, or incomplete
labels.
Consistency: The data should maintain a consistent structure, format, and labeling
scheme across all entries.
6. Data Quantity
The quantity of training data directly impacts the model’s ability to learn and generalize:
Small Datasets: With insufficient data, the model may not learn enough about the
problem domain, leading to underfitting.
Large Datasets: More data can improve model performance by reducing variance and
overfitting, but it also increases computational costs.
7. Data Diversity
A diverse dataset ensures that the model is exposed to a wide range of inputs, helping it to
generalize better. If the data lacks diversity, the model may become biased and perform
poorly on data points that are different from those seen during training.
{(x 1 , y 1 ), (x 2 , y 2 ), … , (x n , y n )}
Where:
For example:
16 / 67
Artificial Intelligence Applications
yi would be the price of the house.
The goal is to learn a function f (x) that maps input x to the correct output y based on the training
data.
{x 1 , x 2 , … , x n }
The system must find patterns, relationships, or clusters in the data without guidance on what the
correct output should be. For example, clustering algorithms such as k-means group data points into
clusters based on their similarities.
17 / 67
Artificial Intelligence Applications
Training data consists of audio files paired with transcriptions. The system learns to map
sound waves (features) to words (labels), improving speech-to-text systems.
3. Natural Language Processing (NLP):
In NLP tasks like sentiment analysis, training data consists of text reviews labeled with their
sentiment (positive, negative, neutral). The model learns the relationship between word
sequences and sentiment labels.
4. Reinforcement Learning:
While reinforcement learning doesn’t use labeled data like supervised learning, it still relies
on experience data in the form of actions taken and rewards received. The system learns
through this interaction data, adjusting its behavior to maximize cumulative rewards.
Summary
Training data is a crucial element in the development of any learning system. It forms the basis from
which a machine learning model can learn, improve, and generalize to new data. The quality, quantity,
diversity, and relevance of the training data are all critical to the success of the learning system.
Proper preprocessing, handling of imbalances, and attention to data quality help ensure that the
system can achieve its goals, whether that’s image recognition, sentiment analysis, or predictive
maintenance in manufacturing.
Concept Representation
Concept representation is a crucial aspect of developing a learning system in machine learning
(ML). It refers to how information, knowledge, or abstract ideas (concepts) are encoded in a way that a
machine learning model can understand and use. Effective concept representation helps a learning
system capture important features, relationships, and patterns in the data, which directly affects its
ability to generalize and perform well on unseen data.
For example:
In a spam detection system, the concept of “spam” may be represented by features such as
the presence of certain keywords, email structure, or sender information.
In an image classification task, the concept of “cat” is represented by visual features like
edges, color patterns, and textures.
18 / 67
Artificial Intelligence Applications
Example:
For predicting house prices, features might include square footage, location, number of
rooms, etc.
In image recognition, features like edges, shapes, and textures represent the concept of
objects within images.
2. Dimensionality of Features:
The dimensionality of the feature space is the number of features used to represent a
concept. High-dimensional data may contain more information but can also increase
computational complexity and the risk of overfitting.
Techniques like Principal Component Analysis (PCA) are used to reduce dimensionality
while retaining the most important aspects of the data.
Example:
In NLP, the concept of a sentence can be represented by vectors, where each word or
19 / 67
Artificial Intelligence Applications
phrase is a feature in a high-dimensional space (e.g., word embeddings).
3. Feature Engineering:
Feature engineering involves transforming raw data into meaningful features that better
represent the underlying concepts. This step often requires domain knowledge and
experimentation.
Techniques include normalization, polynomial features, and encoding categorical variables.
Example:
In a fraud detection system, features like transaction frequency, location, and time of day
could be created from raw transaction data.
4. Representation Learning:
In cases where feature engineering is not feasible (e.g., raw images or audio),
representation learning techniques such as deep learning are used to automatically learn
the best way to represent concepts from the data itself.
Convolutional Neural Networks (CNNs), for instance, automatically learn hierarchical
features for image data, where lower layers capture basic features like edges, and higher
layers capture complex patterns like objects.
5. Handling Missing or Incomplete Data:
Real-world data often has missing or incomplete information. Ensuring the learning system
handles these gaps correctly is essential for robust concept representation.
Approaches include imputing missing values, using algorithms that can handle missing
data, or discarding incomplete data points.
Example:
In healthcare data, patient records may have missing information. Methods such as
imputing the mean value or using advanced techniques like matrix factorization can help
represent the concept of a patient’s health status accurately.
6. Data Types and Structures:
Concepts can be represented using different data types and structures depending on the
task:
Structured data: Tabular data where each row represents an instance and each
column represents a feature (e.g., sales data).
Unstructured data: Data such as text, images, or audio that does not follow a
predefined structure. Representation techniques like word embeddings for text and
pixel arrays for images are commonly used.
Graph-based data: In cases like social networks, the relationships between entities
are as important as the entities themselves. Graph-based representations capture
both the nodes (entities) and edges (relationships).
7. Conceptual Hierarchies and Ontologies:
20 / 67
Artificial Intelligence Applications
Some learning systems may benefit from representing concepts as part of a hierarchy or
ontology. This is especially important in domains where concepts are naturally organized
into categories and subcategories (e.g., taxonomies).
Example:
In medical diagnosis, diseases might be represented in a hierarchical structure, with
broader categories (e.g., respiratory diseases) broken down into more specific ones (e.g.,
asthma, bronchitis).
A simple method for representing categorical data. Each category is represented by a binary
vector, where a single element is "1" (indicating the presence of the category), and all others
are "0."
Example:
For a feature like "color" with values {red, green, blue}, the representation could be:
Red: [1, 0, 0]
Green: [0, 1, 0]
Blue: [0, 0, 1]
2. Word Embeddings:
In Natural Language Processing (NLP), word embeddings like Word2Vec or GloVe map
words into continuous vector spaces, where semantically similar words are close to each
other in the vector space.
Example:
Words like "king" and "queen" may be represented as vectors with a similar structure,
capturing their relationship to each other.
3. Bag-of-Words (BoW):
Example:
For the text “The cat sat on the mat,” the BoW representation could be [cat: 1, sat: 1, mat: 1,
the: 2, on: 1].
4. Vector Representations:
Some models use latent variables to represent underlying hidden concepts that are not
directly observable but inferred from data. These are common in models like Hidden
Markov Models (HMMs) or Autoencoders.
Example:
In topic modeling, latent variables might represent abstract topics that explain the
distribution of words in a document.
Summary
Concept representation is a fundamental aspect of developing a learning system, as it dictates how
information is encoded and interpreted by machine learning models. This involves selecting,
engineering, and structuring features to best capture the underlying patterns and relationships in data.
Proper representation improves generalization, model performance, and interpretability. Several
techniques, such as one-hot encoding, word embeddings, and vector representations, are used to
22 / 67
Artificial Intelligence Applications
represent different types of data, ensuring the learning system can effectively understand and process
the concepts it needs to learn.
In most machine learning problems, the true underlying function that maps inputs to outputs is
unknown or too complex to model explicitly. Function approximation allows learning systems to infer
this mapping from the data and make useful predictions.
For example:
In a house price prediction problem, f (x) could represent the true relationship between house
features (like area, number of rooms) and house price. The learning algorithm approximates this
function using the available training data.
The function approximation process seeks to minimize the difference between the true function f (x)
and the learned function f^(x).
The target function represents the real-world relationship between input features and
outputs. It is often complex, nonlinear, and unknown, which is why a learning system aims
to approximate it.
Example:
In a medical diagnosis system, f (x) might represent the underlying relationship between a
patient’s symptoms (input x) and the correct diagnosis (output y).
2. Hypothesis Space H:
The hypothesis space refers to the set of all possible functions that the learning system
can choose from to approximate the target function. The model determines the size and
flexibility of the hypothesis space.
For example, a linear model has a smaller hypothesis space because it can only represent
linear relationships, while a neural network has a much larger hypothesis space and can
capture complex, nonlinear relationships.
3. Learning Algorithm:
The learning algorithm is responsible for selecting the best function f^(x) from the
hypothesis space by minimizing the error between the predicted outputs and the actual
outputs in the training data.
Common learning algorithms include gradient descent, which adjusts model parameters to
minimize the error.
4. Loss Function:
The loss function measures how well the approximate function f^(x) performs on the
training data. It quantifies the difference between the predicted output y^ and the true output
y.
24 / 67
Artificial Intelligence Applications
Common loss functions:
Mean Squared Error (MSE) for regression tasks:
n
1
2
MSE = ∑(y i − y
^i )
n
i=1
5. Model Complexity:
Model complexity refers to how flexible and powerful the function approximator is. A simple
model like linear regression may approximate only simple relationships, while complex
models like neural networks can approximate highly nonlinear functions.
However, high complexity can lead to overfitting, where the model learns the noise in the
training data rather than the true underlying pattern.
6. Capacity of the Model:
The capacity of the model refers to its ability to approximate complex functions. Higher-
capacity models, such as deep neural networks, can represent more complex functions, but
they also come with a higher risk of overfitting.
Regularization techniques like L1/L2 regularization or dropout are often used to control
model capacity and prevent overfitting.
y
^ = w0 + w1 x1 + w2 x2 + ⋯ + wn xn
Linear models are easy to interpret and computationally efficient but can only approximate
simple, linear relationships.
2. Polynomial Regression:
Polynomial regression extends linear regression by allowing the model to fit nonlinear
relationships using polynomial terms:
2 n
^ = w0 + w1 x + w2 x
y + ⋯ + wn x
While this increases the flexibility of the model, it can also lead to overfitting if the degree of
the polynomial is too high.
3. Decision Trees:
25 / 67
Artificial Intelligence Applications
Decision trees are non-parametric models that approximate functions by recursively
partitioning the input space into regions where the output is constant (or nearly constant).
Decision trees can approximate complex, nonlinear functions but may suffer from overfitting
without proper pruning.
4. Support Vector Machines (SVMs):
Support Vector Machines are effective function approximators for both classification and
regression. They use kernels to map the input data into higher-dimensional spaces where
linear relationships can be used to approximate complex functions.
The SVM tries to find the hyperplane that maximizes the margin between data points from
different classes.
5. Neural Networks:
Artificial Neural Networks (ANNs) are powerful function approximators that can capture
highly nonlinear and complex relationships. They consist of layers of interconnected
neurons, where each neuron applies a transformation to the inputs.
Deep Learning models, such as Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs), are widely used for tasks like image recognition, NLP, and time-
series forecasting.
The general form of a neural network's approximation function is:
y
^ = f (W ⋅ x + b)
where W is a weight matrix, b is the bias, and f is an activation function like ReLU, sigmoid,
or tanh.
Models like linear regression, decision trees, and neural networks are commonly used for regression-
based function approximation.
2. Classification Tasks:
Predicting discrete labels such as spam vs. non-spam emails, disease diagnosis, or
sentiment analysis.
Common models include decision trees, SVMs, and neural networks (especially deep
learning).
3. Time Series Forecasting:
Predicting future values in time series data, such as sales forecasting or weather prediction,
involves function approximation of time-dependent relationships.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models are often
used for these tasks.
4. Control Systems:
In robotics or autonomous vehicles, function approximation is used to model control
systems that interact with dynamic environments, enabling decision-making based on
sensor inputs.
27 / 67
Artificial Intelligence Applications
Summary
Function approximation is a central part of developing machine learning systems. It involves
approximating an unknown target function based on data to make predictions and generalize well on
unseen data. Various techniques like linear models, decision trees, SVMs, and neural networks offer
different levels of flexibility and complexity in approximating functions. Proper selection of models,
attention to challenges like overfitting and underfitting, and effective use of regularization methods
help improve function approximation and ensure the learning system performs optimally in real-world
applications.
Types of learning are categorized based on how the machine learning model is trained, and how
much supervision or labeled data is available during the training phase. The two main types of
learning are Supervised Learning and Unsupervised Learning. Each type is suitable for specific
tasks and comes with unique methodologies, advantages, and challenges.
1. Supervised Learning
Supervised learning is a type of learning where the model is trained on a labeled dataset, meaning
that each input has a corresponding known output (also called labels or targets). The goal of the
model is to learn a mapping function from input to output so that it can predict the outputs for new,
unseen inputs.
1. Labeled Data:
In supervised learning, the dataset consists of input-output pairs, where the input X
(features) is associated with a corresponding known output Y (label). The learning algorithm
uses this labeled data to understand the relationship between X and Y .
2. Mapping Function:
The model learns a function f that maps input X to output Y :
Y = f (X)
28 / 67
Artificial Intelligence Applications
During training, the model iteratively improves this function to minimize the difference
between the predicted output Y^ and the true output Y .
3. Training Phase:
The model is trained on a training set containing labeled data. The goal is to adjust the
model’s parameters to minimize the error (loss) between the predicted output and the actual
output.
4. Test Phase:
After training, the model’s performance is evaluated using a test set that contains input data
without known labels. The model uses the learned function f (X) to make predictions, and
its accuracy is determined by comparing predicted labels with actual labels.
1. Classification:
In classification problems, the output is categorical (e.g., spam vs. non-spam emails,
disease diagnosis). The goal is to assign input data to predefined classes.
Example: Predicting whether an email is spam or not, based on its contents.
2. Regression:
In regression problems, the output is continuous (e.g., predicting house prices or
temperature). The model learns to predict a numerical value based on the input features.
Example: Predicting house prices based on features like area, number of rooms, etc.
3. Applications:
Image recognition: Models are trained on labeled images to classify objects within the
image (e.g., recognizing cats vs. dogs).
Speech recognition: Models learn to map speech audio to text transcriptions.
Medical diagnosis: Predicting whether a patient has a disease based on medical data.
Accuracy: Supervised learning tends to produce highly accurate models because it learns from
labeled data, which guides the learning process.
Predictive Power: Models can generalize to make predictions on unseen data, especially if the
training data is comprehensive.
Versatility: It can be applied to a wide variety of tasks, from classification and regression to more
complex problems like image and speech recognition.
2. Unsupervised Learning
Unsupervised learning is a type of learning where the model is trained on an unlabeled dataset,
meaning there are no known outputs provided with the inputs. The goal is for the model to discover
hidden patterns, structures, or relationships in the data without any explicit guidance.
1. Unlabeled Data:
In unsupervised learning, the dataset contains only input data X, without corresponding
output labels. The model must find patterns or structure within the data autonomously.
2. Pattern Discovery:
The goal of unsupervised learning is not to make specific predictions, but to uncover hidden
structures or groupings in the data, such as clusters or associations.
3. No Supervision:
Since there are no labels to guide the learning process, the model relies on statistical
techniques and algorithms to detect underlying structures in the data.
1. Clustering:
Clustering is one of the most common tasks in unsupervised learning. It involves grouping
data points that are similar to one another into clusters.
Example: Customer segmentation in marketing, where customers with similar behaviors
are grouped into clusters for targeted advertising.
Common algorithms: K-Means, DBSCAN, Hierarchical Clustering.
2. Dimensionality Reduction:
Dimensionality reduction is the process of reducing the number of input variables (features)
while preserving as much information as possible. This is especially useful when working
with high-dimensional data.
30 / 67
Artificial Intelligence Applications
Example: Principal Component Analysis (PCA) is a popular technique used to reduce the
dimensions of data for easier visualization or to remove noise.
3. Anomaly Detection:
Unsupervised learning is often used for anomaly detection, where the goal is to identify rare
or unusual data points (anomalies) that do not conform to the general pattern of the data.
Example: Fraud detection in banking transactions, where unusual behavior may indicate
fraudulent activity.
4. Association Rule Learning:
This involves finding interesting relationships between variables in large datasets. It is
commonly used in market basket analysis to identify items that frequently co-occur in
transactions.
Example: In a supermarket, identifying that customers who buy bread also tend to buy
butter.
No Need for Labeled Data: Unsupervised learning can work on unlabeled data, which is often
cheaper and easier to acquire than labeled data.
Data Exploration: It allows for the discovery of hidden patterns, groupings, or associations in
data that may not have been obvious before.
Real-World Applicability: Many real-world datasets are unlabeled, making unsupervised
learning a valuable tool for various applications, such as customer segmentation or anomaly
detection.
Interpretability: The patterns discovered by unsupervised models are often harder to interpret
and may not always align with meaningful real-world concepts.
Evaluation: It is difficult to evaluate the quality of unsupervised learning since there are no
predefined labels to compare against.
Uncertainty in Results: The results from unsupervised learning can be uncertain or ambiguous,
as there is no clear criterion for what constitutes a "good" pattern or grouping.
31 / 67
Artificial Intelligence Applications
Aspect Supervised Learning Unsupervised Learning
Goal Learn a mapping from input to Discover hidden patterns or groupings
output for prediction in data
Common Classification, regression Clustering, anomaly detection,
Tasks dimensionality reduction
Example Predicting house prices Customer segmentation
Model Accuracy, precision, recall, etc. Often subjective, based on the
Evaluation discovered patterns
Advantages High accuracy, good for prediction No need for labeled data, useful for
tasks exploration
Challenges Requires labeled data, risk of Hard to interpret, difficult to evaluate
overfitting
Summary
Supervised learning is ideal for prediction tasks when labeled data is available, such as in
classification and regression problems. It is more accurate and easier to evaluate but depends on
having a good amount of labeled data. Unsupervised learning, on the other hand, is used when
there is no labeled data, with the goal of finding hidden patterns or groupings within the data. It is
useful for exploratory tasks like clustering, anomaly detection, and dimensionality reduction, though its
results are harder to evaluate and interpret. Both types of learning have their own advantages,
challenges, and are widely used in various artificial intelligence applications.
Classification is a fundamental task in machine learning where the goal is to assign input data to one
of several predefined categories or labels. The setup of a classification task typically involves defining
the problem, preparing the data, selecting an appropriate model, and evaluating its performance.
training data and classify unseen data into one of these categories.
32 / 67
Artificial Intelligence Applications
1. Input Data:
The input data consists of features X that describe the object or entity being classified.
Each feature represents an attribute or characteristic of the input.
Example: For classifying emails as spam or not spam, features could be the frequency of
certain words, presence of attachments, or length of the subject line.
2. Target Classes (Labels):
The possible outputs or target classes are the predefined categories into which the input
data can be classified.
Example: In a binary classification task, the target labels could be {0, 1} or
{spam, not spam}. In multi-class classification, there could be more than two labels (e.g.,
^
Y = f (X)
The function f (X) is learned during the training phase based on labeled training data, and it
should be able to generalize well to unseen data during the test phase.
33 / 67
Artificial Intelligence Applications
Data splitting: Split the dataset into training and test sets, typically in a 70:30 or
80:20 ratio.
3. Model Selection:
Select a suitable classification algorithm based on the problem, dataset size, and desired
accuracy. Common models include:
Logistic Regression: Often used for binary classification.
k-Nearest Neighbors (k-NN): Non-parametric method for both binary and multi-class
classification.
Support Vector Machines (SVM): Effective in high-dimensional spaces.
Decision Trees: Easy to interpret and handle both categorical and numerical data.
Neural Networks: Powerful for complex, non-linear problems, especially in multi-class
tasks.
4. Model Training:
Train the model on the training data by providing input features X and their corresponding
labels Y . The model learns patterns in the data by minimizing a loss function (e.g., cross-
entropy loss for classification tasks).
The learning process involves updating the model parameters iteratively to improve the
prediction accuracy.
5. Model Evaluation:
After training, evaluate the model's performance on the test set to check its generalization
ability. Common metrics for classification tasks include:
Accuracy: The proportion of correctly predicted labels over all instances.
34 / 67
Artificial Intelligence Applications
Precision, Recall, F1-Score: These metrics are especially important when dealing
with imbalanced datasets (e.g., more non-spam than spam emails).
Confusion Matrix: A table showing the actual vs. predicted classifications, providing
insight into false positives and false negatives.
6. Model Tuning:
Adjust the model's hyperparameters to improve performance. This may include tuning the
learning rate, regularization strength, tree depth (for decision trees), or kernel function (for
SVMs).
Techniques like grid search or random search are commonly used to find the best
hyperparameters.
Summary
35 / 67
Artificial Intelligence Applications
The setup of a classification task involves several important steps, starting from data collection and
preprocessing to model selection, training, and evaluation. Classification can be applied to a variety of
real-world tasks, such as email spam detection, image recognition, and medical diagnosis. Properly
understanding the problem setup, selecting the right model, and evaluating performance metrics
ensures that the classification system performs well and generalizes to unseen data.
36 / 67
Artificial Intelligence Applications
N
1
Loss = − ^i ) + (1 − y i ) log(1 − y
∑ [y i log(y ^i )]
N
i=1
N M
Loss = − ∑ ∑ y ij log(y
^ij )
i=1 j=1
Loss = max(0, 1 − y i f (x i ))
θ t+1 = θ t − η∇ θ Loss(θ t )
1. Splitting Data:
The available labeled dataset is typically split into two parts:
Training set: Used to train the model.
Validation set: Used to tune the model’s hyperparameters and assess performance
during training.
2. Feature Selection and Preprocessing:
37 / 67
Artificial Intelligence Applications
The input features are selected and preprocessed before training begins. This might involve
normalizing the features, encoding categorical variables, or performing dimensionality
reduction.
3. Model Training:
The model is trained on the training set using the forward-backward pass steps described
earlier. The goal is to minimize the error on the training set while maintaining generalization
to unseen data (i.e., avoiding overfitting).
4. Model Tuning:
Hyperparameters like learning rate, regularization strength, and batch size are tuned during
training to improve model performance.
1. Overfitting:
Definition: Overfitting occurs when the model performs well on the training data but poorly
on new, unseen data. This usually happens when the model is too complex and learns
noise or random fluctuations in the training data.
Solutions:
Regularization: Techniques like L2 regularization (Ridge), L1 regularization (Lasso), or
dropout (for neural networks) help prevent overfitting by penalizing large weights.
Cross-Validation: Use of techniques like k-fold cross-validation ensures the model
generalizes well across different subsets of the data.
2. Underfitting:
Definition: Underfitting occurs when the model is too simple and cannot capture the
underlying patterns in the training data. This results in poor performance on both the training
and test sets.
Solutions:
Increase model complexity (e.g., adding more layers in a neural network, using a more
complex algorithm).
Use more informative features or better feature engineering.
3. Imbalanced Data:
In many real-world classification problems, the classes are not equally represented. For
example, in medical diagnosis, the number of cases of a rare disease might be much
smaller than the number of healthy cases.
Solutions:
Resampling Techniques: Oversampling the minority class or undersampling the
majority class.
38 / 67
Artificial Intelligence Applications
Class Weights: Assigning higher weights to the minority class during training.
4. Selection of Appropriate Model:
Different classification tasks require different models. For example, logistic regression may
work well for simple binary classification, while deep learning models like Convolutional
Neural Networks (CNNs) might be required for image classification tasks.
Solutions: Careful model selection based on the complexity of the problem and available
computational resources.
Summary
Training is the phase in classification where the model learns to map input data to the correct class
labels by minimizing the prediction error. It involves several key steps, including initializing the model,
calculating the loss, updating the model parameters, and iterating through the training data for multiple
epochs. Challenges like overfitting, underfitting, and imbalanced data can affect the training process
but can be mitigated through techniques such as regularization, cross-validation, and early stopping.
Proper training ensures that the model generalizes well and performs accurately on unseen data,
which is the ultimate goal of any classification system.
39 / 67
Artificial Intelligence Applications
Testing is a crucial phase in the classification process, following the training phase. During this stage,
the model's performance is evaluated using a separate test dataset that it has not encountered before.
The goal is to assess how well the trained model can generalize to new, unseen data. This section
outlines the key aspects of the testing process, evaluation metrics, and the importance of testing in
machine learning.
2. Importance of Testing
Model Evaluation: Testing provides insights into the model's accuracy, robustness, and
reliability when predicting unseen data.
Generalization Ability: The primary goal of a classification model is to generalize well to new
data. Testing helps determine if the model is overfitting or underfitting.
Performance Comparison: Testing allows for the comparison of different models or algorithms
to identify the most effective solution for a specific classification problem.
Identifying Bias and Variance: Through testing, it's possible to assess how well the model
balances bias (error due to overly simplistic assumptions) and variance (error due to excessive
complexity).
40 / 67
Artificial Intelligence Applications
Various metrics are calculated to evaluate the model's performance based on its
predictions. These metrics help assess accuracy, precision, recall, and more.
2. Precision:
The proportion of true positive predictions to the total positive predictions made by the
model.
TP
Precision =
TP + FP
where:
TP : True Positives (correctly predicted positive cases)
FP : False Positives (incorrectly predicted positive cases)
3. Recall (Sensitivity):
The proportion of true positive predictions to the actual number of positive instances in the
test set.
TP
Recall =
TP + FN
where:
FN : False Negatives (incorrectly predicted negative cases)
4. F1 Score:
The harmonic mean of precision and recall, providing a single score that balances both
metrics.
Precision ⋅ Recall
F1 = 2 ⋅
Precision + Recall
5. Confusion Matrix:
A matrix that summarizes the performance of the classification model, showing true positive,
true negative, false positive, and false negative predictions. It helps visualize the
performance across different classes.
Actual Positive TP FN
Actual Negative FP TN
41 / 67
Artificial Intelligence Applications
6. Receiver Operating Characteristic (ROC) Curve:
A graphical representation of the trade-off between true positive rate (recall) and false
positive rate across different threshold settings. The area under the ROC curve (AUC) is
often used as a performance measure.
The higher the AUC, the better the model's ability to distinguish between classes.
1. Data Preparation:
The test dataset consists of a separate set of labeled emails, not used in training.
2. Model Evaluation:
The trained model is applied to the test dataset to predict whether each email is spam or not
spam.
3. Calculating Metrics:
The confusion matrix is created to assess the predictions:
Actual Spam TP FN
42 / 67
Artificial Intelligence Applications
From the confusion matrix, calculate accuracy, precision, recall, and F1 score to evaluate
model performance.
4. Interpreting Results:
If the model has high accuracy but low recall, it may indicate a problem with classifying
spam emails (many false negatives), prompting a review of features or retraining the model
with adjusted hyperparameters.
Summary
Testing is a critical phase in the classification process, where the trained model is evaluated using a
separate dataset to measure its performance and generalization ability. By calculating various
evaluation metrics such as accuracy, precision, recall, F1 score, and utilizing tools like the confusion
matrix and ROC curve, one can assess the effectiveness of the classification model. Proper testing
ensures the model's reliability in real-world scenarios and aids in identifying areas for improvement.
43 / 67
Artificial Intelligence Applications
1. Training Dataset: Used to fit the model. It contains labeled examples that the model learns from.
2. Validation Dataset: Used for tuning the model and hyperparameters. It provides an indication of
the model's performance during training.
3. Test Dataset: Used for final evaluation of the model after training and validation. It should never
be used during the training or validation phases to maintain its integrity for assessing
generalization.
44 / 67
Artificial Intelligence Applications
In k-fold cross-validation, the dataset is divided into k subsets (or folds). The model is
trained k times, each time using a different fold as the validation set and the remaining k-1
folds as the training set. This provides a more robust evaluation of model performance.
Example: If k=5, the dataset is split into 5 parts. The model trains on 4 parts and validates
on the 1 remaining part. This process repeats for each fold.
3. Stratified Splitting:
Stratified splitting ensures that each class is represented proportionally in all subsets
(training, validation, and test sets). This is particularly useful in imbalanced datasets to
ensure that each split maintains the distribution of classes.
1. Data Preparation:
45 / 67
Artificial Intelligence Applications
The dataset of labeled images is split into three parts: training (80%), validation (10%), and
test (10%).
2. Training Phase:
The model is trained on the training dataset, and its performance is evaluated on the
validation dataset after each epoch.
3. Hyperparameter Tuning:
The validation dataset is used to fine-tune hyperparameters, such as the learning rate and
batch size, to improve performance.
4. Early Stopping:
If the validation accuracy does not improve for several epochs, training is stopped early to
prevent overfitting.
5. Final Evaluation:
Once training is complete, the model's performance is evaluated on the test dataset, which
provides an unbiased estimate of its generalization ability.
Summary
A validation dataset is an essential component of the machine learning workflow, serving as a tool for
hyperparameter tuning, model selection, and preventing overfitting. By properly splitting the dataset
and applying best practices, practitioners can ensure that their models are robust, generalizable, and
ready for deployment in real-world applications. The validation phase is crucial for fine-tuning the
model before it is evaluated against the test dataset, ultimately leading to improved performance and
reliability.
https://www.geeksforgeeks.org/training-vs-testing-vs-validation-
sets/ < diff and examples here
Overfitting is a common issue in machine learning, particularly in classification tasks. It occurs when
a model learns the training data too well, capturing noise and outliers rather than the underlying
distribution of the data. This results in poor generalization to new, unseen data. Understanding
overfitting, its causes, symptoms, and techniques to mitigate it is crucial for developing effective
machine learning models.
46 / 67
Artificial Intelligence Applications
1. What is Overfitting?
Overfitting happens when a model is too complex relative to the amount of training data available. It
essentially memorizes the training data instead of learning to generalize from it. Consequently, while
the model performs exceptionally well on the training dataset, it fails to perform adequately on
validation and test datasets.
2. Causes of Overfitting
Several factors can contribute to overfitting:
Model Complexity: Models with a large number of parameters (e.g., deep neural networks)
have a greater capacity to memorize the training data, increasing the likelihood of overfitting.
Insufficient Training Data: A small training dataset can lead the model to learn specific patterns
that do not generalize well to broader datasets.
Noise in the Data: If the training data contains a lot of noise or outliers, the model may learn to
recognize these anomalies instead of the actual signal.
Lack of Regularization: Regularization techniques help constrain the complexity of the model.
Without them, the model may adapt too closely to the training data.
3. Symptoms of Overfitting
High Training Accuracy, Low Validation/Test Accuracy: The most common sign of overfitting
is a significant discrepancy between training and validation/test performance. The model
achieves high accuracy on the training set but shows poor performance on validation/test sets.
Learning Curve Behavior: When plotting the training and validation accuracy over training
epochs, an overfitting model will show an increasing training accuracy while the validation
accuracy plateaus or starts to decline.
Complex Decision Boundaries: Overfitted models often create overly complex decision
boundaries that tightly follow the training data points rather than forming a smooth approximation.
where:
R(w) is the regularization term (L1 or L2).
λ controls the strength of the regularization.
3. Simplifying the Model:
Reducing the complexity of the model (e.g., using fewer layers or nodes in a neural
network) can help prevent overfitting. A simpler model has fewer parameters, making it less
likely to memorize the training data.
4. Early Stopping:
Monitoring the model's performance on a validation dataset during training and stopping the
training process once the validation performance starts to degrade can prevent overfitting.
5. Data Augmentation:
Generating additional training examples through techniques such as rotation, translation, or
scaling can enhance the training dataset, making the model more robust and less prone to
overfitting.
6. Dropout:
In neural networks, dropout is a regularization technique that randomly sets a portion of the
neurons to zero during training. This prevents the model from becoming too reliant on
specific neurons and promotes more robust feature learning.
7. Collecting More Data:
If feasible, increasing the size of the training dataset can help mitigate overfitting by
providing the model with more diverse examples to learn from.
48 / 67
Artificial Intelligence Applications
3. Visualizing Overfitting:
A plot of the training data, the overfitted polynomial curve, and the test data shows the curve
oscillating excessively, illustrating how the model captures noise rather than the underlying
trend.
6. Summary
Overfitting is a critical challenge in machine learning, resulting from a model's excessive complexity
relative to the training data. It is characterized by high training accuracy and poor generalization to
validation and test datasets. By employing techniques such as cross-validation, regularization,
simplifying models, early stopping, and data augmentation, practitioners can effectively mitigate the
risks of overfitting and develop models that generalize well to new data. Recognizing the symptoms of
overfitting and applying appropriate strategies is essential for building robust and reliable machine
learning systems.
Linear discriminative models are an essential category of classification algorithms used in machine
learning. These models classify data by establishing a linear boundary to separate different classes in
the feature space. This section delves deeper into linear discriminative models, exploring their
characteristics, mathematical formulations, variations, advantages, limitations, and practical
applications.
49 / 67
Artificial Intelligence Applications
2. Mathematical Formulation
The general mathematical representation of a linear discriminative model can be expressed as:
T
f (X) = w X + b
where:
The decision boundary is determined by the equation f (X) = 0. The model classifies an instance as
belonging to class 1 if f (X) > 0 and to class 0 if f (X) ≤ 0.
1
P (Y = 1|X) =
1 + e −f (X)
The model is trained using maximum likelihood estimation, optimizing the weights based on
the training data.
2. Support Vector Machine (SVM):
Linear SVM finds the hyperplane that maximizes the margin between two classes. The
optimization problem can be stated as:
1
2 T
min ||w|| subject to y i (w x i + b) ≥ 1, ∀i
2
SVM can also be extended to handle non-linear boundaries through the kernel trick,
allowing it to perform well on more complex datasets.
3. Linear Discriminant Analysis (LDA):
LDA is a generative model that assumes that the features are normally distributed within
each class. It seeks to find a linear combination of features that maximizes the separation
between classes:
T
w SB w
J (w) =
T
w SW w
50 / 67
Artificial Intelligence Applications
4. Perceptron:
The perceptron is an early neural network model that classifies instances by a linear
function. The weights are updated iteratively based on the misclassified points until
convergence.
51 / 67
Artificial Intelligence Applications
1. Data Representation:
Feature vector: X , where x is age and x is income.
= [x 1 , x 2 ] 1 2
2. Modeling:
The logistic regression model predicts the probability of purchase:
1
P (Y = 1|X) =
−(w 1 x 1 +w 2 x 2 +b)
1 + e
3. Training:
Using the training dataset, we optimize w , w , and b using maximum likelihood estimation.
1 2
4. Making Predictions:
For a new customer, we input their age and income to the model to obtain the probability of
purchase. If this probability exceeds a defined threshold (e.g., 0.5), the customer is
classified as likely to purchase.
8. Summary
Linear discriminative models are fundamental tools in machine learning for classification tasks. By
leveraging linear decision boundaries, these models can efficiently separate classes based on input
features. Understanding their mathematical formulations, variations, advantages, and limitations is
vital for applying them effectively in real-world applications. While linear discriminative models may not
capture complex relationships, their efficiency and interpretability make them valuable in many
scenarios, serving as a foundational approach to classification in artificial intelligence.
2. Mathematical Foundation
Non-linear discriminative models can be expressed in various forms depending on the specific
algorithm used. A general representation can be formulated as:
T
f (X) = g(w X + b)
where:
The choice of the activation function g(⋅) determines the nature of the non-linearity introduced into the
model.
f (X) = ∑ α i y i K(X i , X) + b
i=1
where K(⋅) is a kernel function (e.g., polynomial, Gaussian), transforming the input space
into a higher-dimensional space.
2. Artificial Neural Networks (ANNs):
ANNs consist of interconnected layers of neurons, where each neuron applies a non-linear
activation function to its input. The general form of a neural network model can be
expressed as:
(2) (1) (1) (2)
f (X) = h(W ⋅ g(W X + b ) + b )
where:
g(⋅) is the activation function of the first layer.
53 / 67
Artificial Intelligence Applications
h(⋅) is the activation function of the output layer.
W
(1)
and W (2)
are weight matrices, while b (1)
and b (2)
are bias vectors.
3. Decision Trees:
Decision trees create non-linear decision boundaries by recursively partitioning the feature
space based on feature values. The output is determined by the majority class in the leaf
node where the instance falls.
4. Random Forests:
Random forests are an ensemble of decision trees, combining the outputs of multiple trees
to improve classification accuracy and robustness against overfitting.
5. Gradient Boosting Machines (GBMs):
GBMs build decision trees sequentially, where each new tree aims to correct the errors
made by the previous ones. This allows for flexible modeling of complex decision
boundaries.
54 / 67
Artificial Intelligence Applications
1. Feature Representation:
The data points are represented in a two-dimensional feature space, but they are arranged
in concentric circles, making linear separation impossible.
2. Using a Non-Linear Kernel:
We apply a Gaussian (RBF) kernel to transform the data into a higher-dimensional space,
where a linear decision boundary can effectively separate the classes.
3. Decision Function:
The decision function in the transformed space becomes:
N
f (X) = ∑ α i y i K(X i , X) + b
i=1
8. Summary
55 / 67
Artificial Intelligence Applications
Non-linear discriminative models are a powerful class of algorithms capable of modeling complex
relationships in data. By allowing for non-linear decision boundaries, these models offer increased
flexibility and improved performance across various applications. Understanding the mathematical
foundations, variations, advantages, and limitations of non-linear models is essential for effectively
applying them in real-world scenarios, making them indispensable tools in the field of artificial
intelligence and machine learning.
Visual Representation
A simple representation of a decision tree is as follows:
56 / 67
Artificial Intelligence Applications
[Feature 1]
/ \
Yes No
/ \
[Feature 2] [Feature 3]
/ \ / \
Yes No Yes No
| | | |
[Class A] [Class B] [Class C] [Class D]
2
Gini(D) = 1 − ∑ p k
k=1
Entropy: Measures the level of uncertainty or disorder in the dataset. The goal is to
minimize entropy.
K
Entropy(D) = − ∑ p k log 2 (p k )
k=1
57 / 67
Artificial Intelligence Applications
1. Data Representation:
Features: Monthly charges, contract type, customer support calls, and payment method.
Target: Churn (Yes/No).
2. Choosing the Best Feature:
Calculate Gini impurity or entropy for each feature and select the one that provides the best
split.
3. Splitting the Data:
Based on the selected feature, split the dataset into subsets.
4. Repeat Process:
Continue splitting the subsets until the stopping criteria are met.
5. Final Tree:
The resulting decision tree will classify customers into churn or not churn based on their
features.
9. Summary
Decision trees are a powerful and intuitive method for classification and regression tasks in machine
learning. Their ability to represent decision-making processes in a straightforward manner makes
them widely used in various applications. While they have advantages in interpretability and handling
diverse data types, their limitations, such as susceptibility to overfitting and instability, must be
59 / 67
Artificial Intelligence Applications
managed through appropriate techniques, including pruning and ensemble methods. Understanding
decision trees is essential for leveraging their capabilities in artificial intelligence and data-driven
decision-making.
Key Concepts:
Random Variables: Variables whose possible values are outcomes of a random phenomenon.
Probability Distributions: Mathematical functions that describe the likelihood of different
outcomes.
2. Conditional Models
Conditional models focus on modeling the conditional probability of the target variable given the
input features. They directly estimate the probability of the output variable based on the input data.
Mathematical Foundation:
The conditional probability can be expressed as:
P (Y |X)
60 / 67
Artificial Intelligence Applications
where:
Common Examples:
1. Logistic Regression:
A linear model used for binary classification. The probability of the target class is modeled
using the logistic function:
1
P (Y = 1|X) =
−(β 0 +β 1 X 1 +β 2 X 2 +...+β n X n )
1 + e
3. Generative Models
Generative models aim to learn the joint probability distribution of the input features and the target
variable. They can generate new data points by modeling how the data is created.
Mathematical Foundation:
The joint probability can be expressed as:
where:
Common Examples:
61 / 67
Artificial Intelligence Applications
n
P (Y ) ∏ i=1 P (X i |Y )
P (Y |X 1 , X 2 , . . . , X n ) =
P (X 1 , X 2 , . . . , X n )
62 / 67
Artificial Intelligence Applications
Sentiment Analysis: Probabilistic models can classify text into different sentiments (positive,
negative, neutral) by estimating the conditional probabilities of sentiment labels given the text
features.
Image Generation: Generative models like GANs (Generative Adversarial Networks) and GMMs
are used in generating realistic images and art.
Medical Diagnosis: Probabilistic models can assist in diagnosing diseases by estimating the
likelihood of various conditions based on patient symptoms and test results.
1. Data Representation:
Features: Words in the email (e.g., "free," "winner," "money").
Target: Spam (1) or Not Spam (0).
2. Calculating Prior Probabilities:
Calculate the prior probabilities for spam and not spam based on the training dataset:
3. Calculating Likelihood:
Calculate the likelihood of each word given the class labels using frequency counts.
4. Making Predictions:
For a new email, calculate the posterior probabilities for each class using Bayes' theorem:
n
i=1
i=1
5. Final Classification:
Classify the email as spam if (P(Spam|X) > P(Not , Spam|X)).
8. Summary
63 / 67
Artificial Intelligence Applications
Probabilistic models are foundational in machine learning, providing a robust framework for
understanding and modeling uncertainty. Conditional models focus on estimating the likelihood of
outcomes given input features, while generative models aim to model the underlying data distribution.
Each type of model has its advantages and limitations, making them suitable for various applications.
Understanding probabilistic models is crucial for leveraging their capabilities in artificial intelligence
and developing effective prediction systems.
Key Concepts:
Distance Metric: A function used to quantify the distance between data points in the feature
space.
K-Nearest Neighbors (KNN): A common variation where the classification or prediction is made
based on the (k) closest neighbors.
64 / 67
Manhattan Distance:
Minkowski Distance:
3. Compute Distances:
4. Identify Neighbors:
Artificial Intelligence Applications
Choose a distance metric to measure the proximity between points. Common distance
metrics include:
Euclidean Distance:
i=1
n
∑(p i − q i )
i=1
where (p) and (q) are two data points in (n)-dimensional space.
d(p, q) = ∑ |p i − q i |
i=1
d(p, q) = (∑ |p i − q i |
Sort the calculated distances and select the (k) nearest neighbors.
5. Make Predictions:
m
)
2
1/m
For a new data point (X{new}), calculate the distance from (X{new}) to all points in the
training dataset.
For Classification: Assign the most frequent class among the (k) neighbors to (X_{new}).
For Regression: Compute the average (or weighted average) of the values of the (k)
neighbors.
66 / 67
Artificial Intelligence Applications
1. Data Representation:
Features: Sepal length, sepal width, petal length, petal width.
Target: Species (Setosa, Versicolor, Virginica).
2. Choosing (k):
Select a value for (k), e.g., (k=3).
3. Calculating Distances:
For a new flower with specific measurements, calculate the Euclidean distance to all flowers
in the training set.
4. Identifying Neighbors:
Select the 3 closest neighbors based on the calculated distances.
5. Making Predictions:
Assign the species based on the majority class among the 3 nearest neighbors.
8. Summary
The Nearest Neighbor algorithm is a powerful and intuitive method for classification and regression
tasks in machine learning. Its reliance on local data points makes it suitable for capturing complex
relationships. While it has several advantages, such as simplicity and adaptability, it also faces
challenges related to computational efficiency, sensitivity to noise, and the choice of parameters.
Understanding the Nearest Neighbor algorithm is crucial for leveraging its capabilities in various
artificial intelligence applications.
67 / 67