Unit 1 - Fundamentals of Ai - Part I
Unit 1 - Fundamentals of Ai - Part I
FUNDAMENTALS OF AI and ML
What Is AI, The Foundations of Artificial Intelligence, The History of Artificial Intelligence, The State of the
Art, Risks and Benefits of AI, Types of ML: Supervised, Unsupervised, and Reinforcement Learning, ML
Workflow and Model Development Cycle, Data Preprocessing: Data Cleaning, Handling Missing Values,
Encoding Categorical Data, Data Normalisation, Feature Scaling. Introduction to Python ML Libraries:
Scikit-Learn, Pandas, Numpy, Dimensionality Reduction (PCA)
Artificial Intelligence
Artificial intelligence (AI) is a field of science that focuses on building machines and
computers that can learn, reason, and act in ways that would normally require
human intelligence.
AI systems use math and logic to simulate human reasoning, and learn from data
to make decisions and predictions.
History of AI
History of AI
● Antiquity: Myths, stories, and rumors of artificial beings with intelligence or consciousness
● 1940s: The invention of the programmable digital computer, which was based on abstract
mathematical reasoning
● 1956: The founding of the field of AI research at a workshop at Dartmouth College
● 1970s: The popularity of expert systems, which used the knowledge of experts to create
programs
● 1997: IBM's Deep Blue beat chess world champion Garry Kasparov in a six-game series
● 2022: OpenAI released the AI chatbot ChatGPT, which interacted with users in a more
realistic way than previous chatbots
● Perceptron: Considered the first artificial neural network, a program that makes decisions
in a manner similar to the human brain
AI - ML - DS - NLP
Types of AI
The Foundations of Artificial Intelligence
Influences on AI and ML
Computer
Economics
Science
Philosophy & AI
Logic & Reasoning: Basis for
knowledge representation, reasoning,
and ethical AI.
● Algorithms & Programming: Enable efficient data handling and AI model development.
● Data Structures: Key for fast computation and retrieval.
● Hardware Innovations: GPUs and TPUs accelerate AI workflows.
Economics
State-of-the-art (SOTA) AI models represent the most advanced and high-performing algorithms or
architectures in a specific domain at a given time.
Examples:
DATA
QUANTITATIVE QUALITATIVE
INTERVAL and
DISCRETE CONTINUOUS NOMINAL ORDINAL
RATIO
Main Differences
POINT OF DIFFERENTIATION QUANTITATIVE DATA QUALITATIVE DATA
Number Representation ×
How Much, How Many, How Often ×
How and Why This Happened ×
Represented in Categories ×
Arithmetic and Logical Operations ×
EXAMPLE Exam score, weight, Height, Colors, Names, Ethnicity
Temperature
Main Differences
Logical Operations ×
Examples: 𝑀+𝐹 𝑆+𝑀
Gender (M,F) × T-shirt Size (S, M, XL) ×
2 2
M>F × S<M
Marital Status (Married, Single) Letter Grades (A,B, C)
Main Differences
Points of Differentiation Interval Data Ratio Data
Numerical value
Order
Difference
True Zero ×
Temperature Weight
40 60
20 30
10 10
0
DATA based on Its STRUCTURE
DATA QUALITY ISSUES
Inconsistencies
Quality
Issues
Missing
Duplicacy
Value
Lifecycle
Data Collection
Surve
Interviews
ys
Recor
Observatio Experimen
ds
ns ts
Data Pre-Processing
Malicious data
Handling Erroneous
Missing data data
Inconsistent data
Removing Outliers
Handling Missing values
Eliminated
through common
sense or domain
knowledge
Irrelevance of student’s ID in
predicting student’s GPA
Exploratory Data Analysis
Why
Something
What is Data
Happened?
telling Us??
?
Exploratory Data Analysis
Model Building
Machine Learning
Clustering
Unsupervised
Dimensionality
Reduction
Regression
Supervised
Classification
Reinforcement
Applications of Machine Learning
1 Healthcare
Machine learning is being used to diagnose diseases,
treatment outcomes, and develop new drugs and
2 Finance
Machine learning is being used to detect fraud, predict
market trends, and manage risk.
3 Marketing
Machine learning is being used for targeted advertising,
advertising, customer segmentation, and personalized
recommendations.
ENCODING
● Encoding is the transformation of categorical variables to binary or numerical values.
● For example, to treat male or female for gender as 1 or 0.
● Categorical variables must be encoded in many modeling methods such as linear
regression, SVM, neural networks.
● Types of Encoding techniques:
● Ordinal Encoding
● Label Encoding
● One-Hot Encoding
ORDINAL ENCODING
● An Ordinal Encoder is used to encode categorical features into an ordinal
numerical value (ordered set).
● This approach transforms categorical value to numerical value in ordered
sets.
● This encoding technique appears almost similar to Label Encoding.
● But, label encoding would not consider whether a variable is ordinal or not,
but in the case of ordinal encoding, it will assign a sequence of numerical
values as per the order of data.
● let’s say the feedback data is collected
using a Lik ert scale in which numerical
code 1 is assigned to Poor, 2 for Good, 3
for Very Good, 4 for Excellent. If you
observe, we know that 5 is better 4, 5 is
ORDINAL much better than 3, but taking the
ENCODING difference between 5 and 2 is
meaningless (Excellent minus Good is
meaningless).
LABEL ENCODING
● we replace the categorical value with a numeric value between 0 and the
number of classes minus 1. If the categorical variable value contains 5
distinct classes, we use (0, 1, 2, 3, and 4).
● let us take COVID-19 cases in India across states. If we observe the below
data frame, the State column contains a categorical value that is not very
machine-friendly and the rest of the columns contain a numerical value.
● Let us perform Label encoding for State Column.
LABEL
ENCODING
Approach 1
Approach 2 – Category
1. AsCodes
you had already observed that “State”
column datatype is an object type which is
by default hence, need to convert “State” to a
Label category type with the help of pandas
Encoding in 2. We can access the codes of the categories
Python: by running covid19[“State].cat.codes
Approach 2
Approach 2 – Category Codes
Label
Encoding in
Python:
Approach 2
● In this approach, for each category of a
feature, we create a new column (sometimes
ONE-HOT called a dummy variable) with binary
encoding (0 or 1) to denote whether a
ENCODING particular row belongs to this category.
● Let us consider the previous State column,
and from the below image, we can notice
that new columns are created starting from
state name Maharashtra till Uttar Pradesh,
and there are 6 new columns created.
● 1 is assigned to a particular row that belongs
to this category, and 0 is assigned to the rest
of the row that does not belong to this
category.
ONE-HOT
ENCODING
● A potential drawback of this method is a significant
increase in the dimensionality of the dataset (which is
called a Curse of Dimensionality).
● one-hot encoding is the fact that we are creating
additional columns, one for each unique value in the
set of the categorical attribute we’d like to encode.
● So, if we have a categorical attribute that contains,
say, 1000 unique values, that one-hot encoding will
generate 1,000 additional new attributes and this is not
desirable.
ONE-HOT ENCODING