0% found this document useful (0 votes)
40 views28 pages

CSE445 NSU Week - 1

The document discusses the concepts of AI and machine learning, defining AI as the ability of machines to mimic human behavior and ML as a subset of AI that learns from data. It outlines various machine learning problems, including checkers learning, handwriting recognition, student admission prediction, and email classification, while also explaining different types of learning such as supervised, unsupervised, semi-supervised, and reinforcement learning. Additionally, it highlights issues related to bias in ML models and ethical considerations in data mining.

Uploaded by

Rabiul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views28 pages

CSE445 NSU Week - 1

The document discusses the concepts of AI and machine learning, defining AI as the ability of machines to mimic human behavior and ML as a subset of AI that learns from data. It outlines various machine learning problems, including checkers learning, handwriting recognition, student admission prediction, and email classification, while also explaining different types of learning such as supervised, unsupervised, semi-supervised, and reinforcement learning. Additionally, it highlights issues related to bias in ML models and ethical considerations in data mining.

Uploaded by

Rabiul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

AI and machine learning problems

- AI - ability of a computer or machine to mimic or imitate human intelligent behavior


and perform human-like tasks
- ML - subset of AI that has the ability to automatically learn from the data without
explicitly being programmed
- All machine learning is AI, but not all AI is machine learning
- Machine Learning
- It is a science that grew up from AI
- The term, ML, was coined in 1959, by Arthur Samuel
- A computer program is said to learn from experience E, with respect to some task T
and performance measure P if its performance at task T, as measured by P,
improves with experience E. (Tom M. Mitchell)
Checkers learning problem
Task T: playing checkers

Performance measure, P: % of games


won against opponents

Training Experience, E: Playing practice


games against itself
Handwriting Recognition Problem
Task T: Recognizing and classifying handwritten words/characters within images

Performance measure P: % of words/characters correctly classified

Training Experience, E: Predicting the word/character from a collection of handwritten


words/characters with given classifications

- MNIST Dataset (Yann Lecunn et al.)


- BanglaLekha Isolated (Nabeel Mohammed, Sifat Momen et al.)
- https://data.mendeley.com/datasets/hf6sf8zrkc/2
- Dataset comprises of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound
characters (total of 1,66,105 handwritten character images)
- Bangla Sign Language Dataset (Intisar Tahmid Naheen, Riasat Khan et al.)
- Deep Learning-based Bangla Sign Language Detection with an Edge Device
- Bangla sign language dataset of 49 classes and approximately 80 images for every class
- https://doi.org/10.1016/j.iswa.2023.200224
Student Admission Prediction Problem
Task T: Classifying students’ admissions into categories/rankings of universities

Performance measure P: % of students’ admissions correctly classified

Training Experience, E: Predicting the categories/rankings of universities from a


collection of dataset with given classifications

- Masters and Doctor of Philosophy admission prediction of Bangladeshi students into


different classes of universities (Md Naimul Islam Suvon et al.)

- 400 data for PhD students and 300 for the MS candidates

- Nine classes of universities according to QS world university rankings


Classifying an incoming email as either a spam or
not spam
Task T: Correctly classify an email as either spam or not spam

Performance measure, P: % of emails correctly classified

Training Experience, E: A database of emails with proper labeling as either spam


or not.
Robot driving learning problem
Task T: driving on public highways using vision sensors

Performance measure, P: Average distance travelled before an error (as judged


by a human observer)

Training Experience, E: A sequence of images and steering commands recorded


while observing a human driver.
Different types of learning
- Supervised learning
- Semi-Supervised learning
- Unsupervised learning
- Reinforcement learning
Supervised Learning: Classification Problem
- In supervised learning, the algorithm builds a mathematical model from a set of data that contains inputs
and desired outputs <X, y>.
- Training data fed to algorithm includes the desired answers/solutions (labels)
- y : target variable, output, dependent variable, class, label: Classification (Binary) problem
- X: input variable, attributes, features, independent variable
- Discrete value output (Male or Female, True or False, Yes or No) The data contains 14
instances/records/samples
and 4 attributes. The target
variable is play

Decision Tree Model


Supervised Learning
- Data:
- 1. Qualitative: Non-numeric/Categorical (hair color);
- 2. Quantitative (numeric)
- Quantitative (numeric): Discrete (certain specfic value); Continuous (any value)

- output y decides classification or regression problem


- Classification (y categorial/discrete)
- Classification algorithms are used when outputs are restricted to some discrete set of values
- E.g. classifying whether an email is a spam email or not, whether a tumor is malignant or
benign.
- Classifying from medical records whether a pregnancy is high risk or not.
- Classify the species of iris https://en.wikipedia.org/wiki/Iris_flower_data_set
- Regression (y numeric and continuous)
- Regression algorithms are used when outputs have continuous values i.e., they may have any value
within the range.
- E.g. predicting the price of a house from location, size of the area and number of rooms
Supervised Learning
ML algorithm uses the training dataset to create a model/assumption/hypothesis which
can be used to predict (unseen/new data)

ML algorithm iteratively makes predictions on the training data and is corrected by the
output/supervision of teachers

<X, y>.

(
Supervised Learning: Regression Problem
● Below is the table containing home prices in Monroe twp, NJ. Here price
depends on area (square feet), bedrooms and age of the home (in years).
Given these prices we have to predict prices of new homes based on
area, bedrooms and age.

● Given these home prices find out price of a home that has,
● 3000 sqr ft area, 3 bedrooms, 40 year old
● 2500 sqr ft area, 4 bedrooms, 5 year old
● Regression problem: continuous value output
Supervised Learning - Classification
Supervised learning - Classification
Regression
Unsupervised Learning
- Unlabeled dataset
- System learns without direct human supervision
- Unsupervised learning algorithms take a set of data that contains only inputs, and
find structure/patterns in the data, like grouping or clustering of data points.
- The algorithms therefore learn from test data that has not been labeled, classified or
categorized
- Instead of responding to feedback, unsupervised learning algorithms identify
commonalities in the data and react based on the presence or absence of such
commonalities in each new piece of data.
- Useful for business intelligence
The Cocktail Party Problem
● Encountered when sounds from different sources in the real-world mix in the air
before arriving at the ear

● Separate independent sources from a mixed signal


Unsupervised Learning
Recreation of the periodic table with an unsupervised machine learning algorithm
Unsupervised Learning
Clustering of different types of stars
Semi-Supervised Learning
● Supervised learning works on labeled data
● Unsupervised learning works on unlabeled data
● In Semi-Supervised learning, the training data contains both labeled and unlabeled
data
○ We have a large dataset. Manual labeling of the entire dataset is laborious and expensive

○ One solution could be to label a sample of the dataset and train the labeled portion to create a model. This would however mean
that we are not fully utilizing the larger dataset we have and thus the model that we create may be less robust

○ A potential solution: (1) Label a sample of the large dataset, (2) train a model using this labeled portion, (3) use the model to
predict the unlabeled portion (pseudo-labeling), (4) train using the entire dataset

○ e.g., Google Photos will cluster similar faces, and ask the user if they are the same person
Reinforcement Learning
● The learning system (agent) can:

○ Observe the environment

○ Agent explores the environment to collect data

○ Select and perform an action based on environment using policy

○ Get rewards/penalties for the action

○ Based on the reward agent will changes its state

○ Agent aim: Maximize reward


● Reinforcement learning differs from supervised learning in a way that in
supervised learning the training data has the answer key with it, so the
model is trained with the correct answer itself
● Whereas in reinforcement learning, there is no answer, but the
reinforcement agent decides what to do to perform the given task. In the
absence of a training dataset, it is bound to learn from its experience
● Very effective in controlled environments = dataset (such as a game of
chess)

○ With the progress in deep learning, increasingly used in more


complex tasks (such as driving the mars rover, industrial
automation)
Issues related to AI (and more particularly ML)
- ML model depends on the data that is fed.
- Bias in the data (conscious / unconscious) can be easily picked up by ML models
- ML systems used for criminal risk assessment of NYPD have been found to be biased against black
people.
- Used to find a person is a potential criminal or not
Issues Related to ML
Issues Related to ML
Genderify — designed to identify a person’s gender by analyzing their username or
email address

the word “professor,” Genderify predicted a 98.4 percent probability for males.
Meanwhile, “stupid” returned a 61.7 percent female prediction
ML and Ethics (More Complex case)
- When applied to people, data mining is frequently used to discriminate - who
gets the loan, who gets the special offer, and so on.
- Certain kinds of discrimination - racist, sexual, religious and so on are not only
unethical but also illegal.
- Using sexual and racist information for medical diagnosis is certainly ethical
but using the same information when mining loan payment behavior is not.
ML and Ethics (Difficulty to solve it)
- Reidentification techniques has provided sobering insights into the difficulty
of anonymizing data.
- 85% of the Americans can be identified from just three pieces of information, zip code, birth
date and sex.

You might also like