0% found this document useful (0 votes)
15 views59 pages

Unit 1 - Fundamentals of Ai - Part I

This document provides an overview of the fundamentals of Artificial Intelligence (AI) and Machine Learning (ML), covering definitions, history, types of ML, and key concepts such as data preprocessing and model development. It discusses the foundations of AI, including influences from philosophy, mathematics, neuroscience, and computer science, as well as the state-of-the-art models and their applications across various fields. Additionally, it addresses data quality issues, encoding techniques, and the lifecycle of data in the context of machine learning.

Uploaded by

Mohamed Nasih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views59 pages

Unit 1 - Fundamentals of Ai - Part I

This document provides an overview of the fundamentals of Artificial Intelligence (AI) and Machine Learning (ML), covering definitions, history, types of ML, and key concepts such as data preprocessing and model development. It discusses the foundations of AI, including influences from philosophy, mathematics, neuroscience, and computer science, as well as the state-of-the-art models and their applications across various fields. Additionally, it addresses data quality issues, encoding techniques, and the lifecycle of data in the context of machine learning.

Uploaded by

Mohamed Nasih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

UNIT 1

FUNDAMENTALS OF AI and ML
What Is AI, The Foundations of Artificial Intelligence, The History of Artificial Intelligence, The State of the
Art, Risks and Benefits of AI, Types of ML: Supervised, Unsupervised, and Reinforcement Learning, ML
Workflow and Model Development Cycle, Data Preprocessing: Data Cleaning, Handling Missing Values,
Encoding Categorical Data, Data Normalisation, Feature Scaling. Introduction to Python ML Libraries:
Scikit-Learn, Pandas, Numpy, Dimensionality Reduction (PCA)
Artificial Intelligence

Artificial intelligence (AI) is a field of science that focuses on building machines and
computers that can learn, reason, and act in ways that would normally require
human intelligence.

AI systems use math and logic to simulate human reasoning, and learn from data
to make decisions and predictions.
History of AI
History of AI

● Antiquity: Myths, stories, and rumors of artificial beings with intelligence or consciousness
● 1940s: The invention of the programmable digital computer, which was based on abstract
mathematical reasoning
● 1956: The founding of the field of AI research at a workshop at Dartmouth College
● 1970s: The popularity of expert systems, which used the knowledge of experts to create
programs
● 1997: IBM's Deep Blue beat chess world champion Garry Kasparov in a six-game series
● 2022: OpenAI released the AI chatbot ChatGPT, which interacted with users in a more
realistic way than previous chatbots
● Perceptron: Considered the first artificial neural network, a program that makes decisions
in a manner similar to the human brain
AI - ML - DS - NLP
Types of AI
The Foundations of Artificial Intelligence

Influences on AI and ML

Philosophy Mathematics Neuroscience

Computer
Economics
Science
Philosophy & AI
Logic & Reasoning: Basis for
knowledge representation, reasoning,
and ethical AI.

Epistemology: Explores learning and


inference.

Ethics: Guides fairness and moral


decision-making.
Mathematics in AI

Linear Algebra & Probability:


Foundations for neural networks and
uncertainty modeling.

Optimization: Core to training


algorithms (e.g., gradient descent).

Graph & Set Theory: Applied in


clustering and decision trees.
Neuroscience
Brain Modeling: Neural networks inspired
by neurons and synapses.

Cognitive Processes: Perception, memory,


and learning inform AI designs.

Hebbian Learning: Basis for


backpropagation and reinforcement
learning.
Computer Science

● Algorithms & Programming: Enable efficient data handling and AI model development.
● Data Structures: Key for fast computation and retrieval.
● Hardware Innovations: GPUs and TPUs accelerate AI workflows.
Economics

● Utility & Game Theory: Drive decision-making and strategic interactions.


● Behavioral Economics: Models irrationality for human-AI collaboration.
● Optimization: Resource allocation for autonomous systems.
State-of-the-art Models

State-of-the-art (SOTA) AI models represent the most advanced and high-performing algorithms or
architectures in a specific domain at a given time.

Examples:

● GPT-4 (OpenAI): Advanced language generation for text and code.


● DALL·E 3 (OpenAI): Text-to-image generation with high fidelity.
● AlphaZero (DeepMind): Reinforcement learning model for games like chess and Go.
● YOLOv8 (Ultralytics): Real-time object detection model.
● Whisper (OpenAI): Speech-to-text recognition across multiple languages.
● Stable Diffusion: Open-source image generation model.
The State of the Art Models
Current Capabilities

Generative AI: Tools like GPT for text generation


and DALL·E for image synthesis.

Autonomous Systems: Self-driving cars navigating


complex environments.

AI in Healthcare: Diagnosis, personalized medicine,


and robotic surgeries.

Enterprise AI: Process automation, predictive


analytics, and customer support.
Emerging Fields in AI

● Explainable AI (XAI): Making AI decisions


transparent and interpretable.
● Quantum AI: Leveraging quantum
computing for faster problem-solving in
optimization and cryptography.
● AI in Edge Computing: Deploying AI on
devices for real-time processing.
● AI in Metaverse: Powering virtual
environments and interactions.
Limitations of Current AI Systems

● Lack of Generalization: AI excels in specific


tasks but lacks human-like adaptability.
● Data Dependency: Requires vast, high-
quality data for training.
● Bias and Fairness: Risk of perpetuating
societal biases.
● Energy Consumption: High computational
costs for large models.
Emerging Domains

● Quantum AI Models: Early-stage


models leveraging quantum
computing for problem-solving.
● Graph Neural Networks (GNNs):
Used for recommendations, social
networks, and drug discovery.
● Transformer-based Time Series
Models: For financial forecasting and
climate modeling.
Visual Representation of SOTA
Risks in AI
Benefits of AI
Pros vs Cons
Categories of data

DATA

QUANTITATIVE QUALITATIVE

INTERVAL and
DISCRETE CONTINUOUS NOMINAL ORDINAL
RATIO
Main Differences
POINT OF DIFFERENTIATION QUANTITATIVE DATA QUALITATIVE DATA
Number Representation ×
How Much, How Many, How Often ×
How and Why This Happened ×
Represented in Categories ×
Arithmetic and Logical Operations ×
EXAMPLE Exam score, weight, Height, Colors, Names, Ethnicity
Temperature
Main Differences

Measuring CONTINUOUS DATA DISCRETE DATA


Can be represented into decimals. Cannot be represented into
E.g. height = 3.5 feet decimals. Eg. Cars ≠ 3.5
Only Measured Only Countable
Temperature Average Distance #Days > Hot
Tree Height In Kilometers Tall/Medium/short
Main Differences
Points of Differentiation Nominal Data Ordinal Data
Label Data ×
Order Data ×
Arithmetic Operations × ×

Logical Operations ×
Examples: 𝑀+𝐹 𝑆+𝑀
Gender (M,F) × T-shirt Size (S, M, XL) ×
2 2

M>F × S<M
Marital Status (Married, Single) Letter Grades (A,B, C)
Main Differences
Points of Differentiation Interval Data Ratio Data
Numerical value
Order
Difference
True Zero ×

Temperature Weight

40 60
20 30
10 10
0
DATA based on Its STRUCTURE
DATA QUALITY ISSUES

Inconsistencies

Quality
Issues
Missing
Duplicacy
Value
Lifecycle
Data Collection

Surve
Interviews
ys

Recor
Observatio Experimen
ds
ns ts
Data Pre-Processing

Malicious data

Handling Erroneous
Missing data data

Removing Irrelevant data


Outliers

Inconsistent data
Removing Outliers
Handling Missing values

Drop Missing values


• Drop the variable
• Drop the instances

Replace Missing values


• Mean, Median, mode
Handling Malicious Data
Handling Erroneous Data
Handling Irrelevant Data

Eliminated
through common
sense or domain
knowledge

Irrelevance of student’s ID in
predicting student’s GPA
Exploratory Data Analysis
Why
Something
What is Data
Happened?
telling Us??
?
Exploratory Data Analysis
Model Building

Machine Learning
Clustering
Unsupervised
Dimensionality
Reduction

Regression
Supervised
Classification
Reinforcement
Applications of Machine Learning

1 Healthcare
Machine learning is being used to diagnose diseases,
treatment outcomes, and develop new drugs and

2 Finance
Machine learning is being used to detect fraud, predict
market trends, and manage risk.

3 Marketing
Machine learning is being used for targeted advertising,
advertising, customer segmentation, and personalized
recommendations.
ENCODING
● Encoding is the transformation of categorical variables to binary or numerical values.
● For example, to treat male or female for gender as 1 or 0.
● Categorical variables must be encoded in many modeling methods such as linear
regression, SVM, neural networks.
● Types of Encoding techniques:
● Ordinal Encoding
● Label Encoding
● One-Hot Encoding
ORDINAL ENCODING
● An Ordinal Encoder is used to encode categorical features into an ordinal
numerical value (ordered set).
● This approach transforms categorical value to numerical value in ordered
sets.
● This encoding technique appears almost similar to Label Encoding.
● But, label encoding would not consider whether a variable is ordinal or not,
but in the case of ordinal encoding, it will assign a sequence of numerical
values as per the order of data.
● let’s say the feedback data is collected
using a Lik ert scale in which numerical
code 1 is assigned to Poor, 2 for Good, 3
for Very Good, 4 for Excellent. If you
observe, we know that 5 is better 4, 5 is
ORDINAL much better than 3, but taking the
ENCODING difference between 5 and 2 is
meaningless (Excellent minus Good is
meaningless).
LABEL ENCODING
● we replace the categorical value with a numeric value between 0 and the
number of classes minus 1. If the categorical variable value contains 5
distinct classes, we use (0, 1, 2, 3, and 4).
● let us take COVID-19 cases in India across states. If we observe the below
data frame, the State column contains a categorical value that is not very
machine-friendly and the rest of the columns contain a numerical value.
● Let us perform Label encoding for State Column.
LABEL
ENCODING

● The numbering is assigned in alphabetical order.


Delhi is assigned 0 followed by Gujarat as 1 and so
on.
scik it-learn library approach

Label Encoding 1. Create an instance of LabelEncoder() and


store it in labelencoder variable/object.
in Python: 2. Apply fit and transform which does the
trick to assign numerical value to
categorical value and the same is stored
Approach 1 in new column called “State_N”
3. Note that we have added a new column
called “State_N” which contains numerical
value associated to categorical value and
still the column called State is present in
the dataframe.
4. This column needs to be removed before
we feed the final preprocess data to
machine learning model to learn.
Label Encoding
in Python:

Approach 1
Approach 2 – Category
1. AsCodes
you had already observed that “State”
column datatype is an object type which is
by default hence, need to convert “State” to a
Label category type with the help of pandas
Encoding in 2. We can access the codes of the categories
Python: by running covid19[“State].cat.codes

Approach 2
Approach 2 – Category Codes

Label
Encoding in
Python:

Approach 2
● In this approach, for each category of a
feature, we create a new column (sometimes
ONE-HOT called a dummy variable) with binary
encoding (0 or 1) to denote whether a
ENCODING particular row belongs to this category.
● Let us consider the previous State column,
and from the below image, we can notice
that new columns are created starting from
state name Maharashtra till Uttar Pradesh,
and there are 6 new columns created.
● 1 is assigned to a particular row that belongs
to this category, and 0 is assigned to the rest
of the row that does not belong to this
category.
ONE-HOT
ENCODING
● A potential drawback of this method is a significant
increase in the dimensionality of the dataset (which is
called a Curse of Dimensionality).
● one-hot encoding is the fact that we are creating
additional columns, one for each unique value in the
set of the categorical attribute we’d like to encode.
● So, if we have a categorical attribute that contains,
say, 1000 unique values, that one-hot encoding will
generate 1,000 additional new attributes and this is not
desirable.
ONE-HOT ENCODING

● one-hot encoding is quite a powerful tool, but it is only applicable for


categorical data that have a low number of unique values.
● Creating dummy variables introduces a form of redundancy to the
dataset.
● This is often referred to as the dummy-variable trap, and it is a best
practice to always remove one dummy variable column (known as the
reference) from such an encoding.
1. As one-hot encoding is also part of data
preprocessing, hence we will take an help of
preprocessing module from sklearn package and
them import OneHotEncoder class as below
2. Instantiate the OneHotEncoder object, note that
Approach 1 – parameter drop = ‘first’ will handle dummy
scik it-learn variable traps
3. Perform OneHotEncoding for categorical variable
library 4. Merge One Hot Encoded Dummy Variables to Actual
approach data frame but do not forget to remove actual column
called “State”
5. From the below output, we can observer, dummy
variable trap has been taken care
Approach 1
– scik it-
learn
library
approach
Approach 1 –
scik it-learn
library
approach
● pd.get_dummies function generates
another DataFrame, we need to
Approach 2 – concatenate (or add) the columns to
our original DataFrame and also don’t
Using ●
forget to remove column called “State”
Here, we use the pd.concat function,
Pandas: with indicating with the axis=1 argument that
we want to concatenate the columns of
the help of the two DataFrames given in the list
(which is the first argument
get_dummie of pd.concat). Don’t forget to remove
actual “State” column
s function
Approach 2
– Using
Pandas:
with the
help of
get_dummi

You might also like