0% found this document useful (0 votes)
11 views57 pages

W1 - Introduction To ML

The document outlines the syllabus for a Machine Learning course (CS-245) taught by Dr. Mehwish Fatima at SEECS-NUST, focusing on foundational concepts, types of machine learning, and practical applications. It covers supervised and unsupervised learning, the ML pipeline, and challenges in machine learning. The course aims to equip students with skills in various algorithms and tools relevant to AI and data science.

Uploaded by

rimahmood2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views57 pages

W1 - Introduction To ML

The document outlines the syllabus for a Machine Learning course (CS-245) taught by Dr. Mehwish Fatima at SEECS-NUST, focusing on foundational concepts, types of machine learning, and practical applications. It covers supervised and unsupervised learning, the ML pipeline, and challenges in machine learning. The course aims to equip students with skills in various algorithms and tools relevant to AI and data science.

Uploaded by

rimahmood2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Spring 2025

CS-245: Machine
Learning
Dr. Mehwish Fatima
Assistant Professor,
AI & DS Department,
SEECS-NUST, Pakistan
WEEK 1:
INTRODUCTION TO MACHINE
LEARNING
AGENDA 3

01 Introduction to course
04 Types of machine
learning

02 Artificial intelligence
05 ML pipeline

03 What is machine
learning 06 Challenges in ML
INTRODUCTION TO COURSE
Course
● This course introduces the foundational concepts of machine learning
(ML) with a focus on understanding core algorithms, performance
evaluation, and practical implementation.
○ Supervised, and unsupervised paradigms
○ Build, analyze, and evaluate ML models.
○ Regression, classification, clustering, dimensionality reduction, and debugging ML
systems
○ Learning with a project that applies ML techniques to solve practical problems.
Instructor
● Phd:
○ Ruprecht-Karls-Universität Heidelberg, Germany (2018–2024)
● Experience:
○ Industry & Academia (10+years)

● Research:
○ https://scholar.google.com/citations?user=zEyTPkMAAAAJ&hl=en
● Research Area:
○ Generative AI (GenAI) & Natural Language Processing (NLP)
○ Computational Linguistics (CL)
WHAT IS
○ Machine Learning (ML) & Deep Learning (DL)

● GENERATIVE
Practical Skills: AI?
○ Languages & Frameworks: Python, PyTorch, TensorFlow, CUDA, C++/Java
○ Tools: DeepSpeed, Docker, Kubernetes, LangChain, AWS, Google Colab, GitHub,
MultiGPU server deployments
ARTIFICIAL INTELLIGENCE
Artificial Intelligence Artificial Intelligence

Machine Learning
The basic goal of AI is to develop
intelligent machines. Deep Learning

GenAI
This consists of many sub-goals:
• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
• Communication
• Creativity
• Learning
Artificial Intelligence Artificial Intelligence

Machine Learning
The basic goal of AI is to develop
intelligent machines. Deep Learning

GenAI
This consists of many sub-goals:
• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
• Communication
• Creativity
• Learning
Artificial
Artificial Intelligence Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenAI

• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
• Communication
• Creativity
• Learning
Artificial
Artificial Intelligence Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenAI

• Perception
• Reasoning
• Control / Motion / Manipulation
• Planning
• Communication
• Creativity
• Learning
Artificial
Artificial Intelligence Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenA
I
• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
• Communication
• Creativity
• Learning
Artificial
Artificial Intelligence
Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenA
I
• Perception
• Reasoning
• Control / Motion /
Manipulation
Planning
•• Communication
• Creativity
• Learning
Artificial
Artificial Intelligence Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenA
I
• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
Communication
• Creativity
• Learning

1
“Deep Style” from https://deepdreamgenerator.com/#gallery 0
Artificial
Artificial Intelligence Artificial Intelligence
Intelligence

Intelligence
Machine Learning
The basic goal of AI is to develop Learning
intelligent machines. Deep Learning
Learning
This consists of many sub-goals: GenAI
GenA
I
• Perception
• Reasoning
• Control / Motion /
Manipulation
• Planning
• Communication
Creativity
• Learning
History
WHAT IS MACHINE LEARNING?
Machine Learning
● Machine Learning (ML) is a subset of AI that enables models to learn
patterns from data and make decisions without being explicitly
programmed.

● Mathematically
○ A model 𝑓 learns a function that maps input 𝑋 to output 𝑌,
○ 𝑌 = 𝑓(𝑋)+ϵ where ϵ represents the error or noise in predictions.
Machine Learning
● Logically
○ The goal of ML is to generalize from past data (training data) to make accurate
predictions on unseen data (test data).

● Example
○ Predicting house prices using features like square footage, location, and number of
rooms.
Learning in Humans
Learning in Humans
● Imagine you're teaching a child to recognize
different types of animals, like dogs and cats.

○ You show them pictures of 100 different


dogs and 100 different cats,
explaining which is which.

○ This is their training data —


the pictures you've already shown them,
and they’ve learned from.
Learning in Humans
Learning in Humans
● Now, if you show them a new picture of a dog
they've never seen before,

○ you expect them to correctly say,


"That’s a dog!"

○ This is them generalizing their learning


to new situations.

○ Even though they’ve never seen this exact dog before,


they recognize it based on what they learned from the
other dog pictures.
Machine Learning
● Training Data is like showing the system
lots of examples
○ like pictures of dogs and cats with
correct labels this is a "dog", this is a "cat"

● Test Data is like showing the system a


brand-new picture and asking it to guess
what it is.

● The better the system is at generalizing,


the better it will be at making accurate predictions on new data it hasn’t
seen before.
Machine Learning
● Generalization
○ The goal is not just to memorize the training data
■ like remembering each dog individually

○ but to learn patterns


■ dogs have certain features like four legs, a tail, fur, etc.

○ that can apply to new, unseen examples.

This is what machine learning models aim to do when we say they are
"generalizing".
Machine Learning
● Generalization
○ A model that has generalized well can handle new data and make correct
predictions,
■ like predicting house prices in a new neighborhood based on the
features (size, number of rooms, etc.) it learned during training.

○ A model that doesn't generalize well might only work on the data it has seen
before and fail when presented with new data, which is called overfitting.
Traditional Programming Approach Vs. ML
Traditional Programming
Approach Vs. ML
ML CLASSIFICATION
Types of Machine Learning
Types of Machine Learning
● There are so many different types of ML systems that it is useful to
classify them in broad categories based on:

○ Whether or not they are trained with human supervision (supervised, unsupervised,
semi-supervised, and reinforcement learning)

○ Whether or not they can learn incrementally on the fly (online versus batch learning)

○ Whether they work by simply comparing new data points to known data points, or
instead detect patterns in the training data and build a predictive model, much like
scientists do (instance-based versus model-based learning)
Types of Machine Learning
Types of Machine Learning
Machine Learning systems can be classified according to the amount and type
of supervision they get during training.
Supervised Learning
Supervised Learning
● The training data you feed to the
algorithm includes the desired solutions,
called labels

● Mathematically
○ Learning a function that maps input 𝑋 to output 𝑌, where labels are provided.

● Use cases
○ Spam detection (classification), house price prediction (regression).
Supervised Learning
Supervised Learning
● Classification
○ predictive model that approximates a
mapping function from input variables to
identify discrete output variables
■ labels or categories

○ The mapping function of classification algorithms is responsible for predicting the


label or category of the given input variables.

○ A classification algorithm can have both discrete and real-valued variables, but it
requires that the examples be classified into one of two or more classes.
Supervised Learning
Supervised Learning
● Regression
○ predict a continuous value based on the
input variables.

○ The main goal of regression problems is to estimate a mapping function based on the
input and output variables.

○ If your target variable is a quantity like income, scores, height or weight, or the
probability of a binary category (like the probability of rain in particular regions), then
you should use the regression model.

○ Ex: Customer segmentation, anomaly detection.


Supervised Learning
Supervised Learning
● Classification vs. Regression
○ Regression helps predict a continuous quantity

○ Classification predicts discrete class labels

● Overlap
○ A regression algorithm can predict a discrete value which is in the form of an integer
quantity

○ A classification algorithm can predict a continuous value if it is in the form of a class


label probability
Some Popular Algorithms
● k-Nearest Neighbors

● Linear Regression

● Logistic Regression

● Support Vector Machines (SVMs)

● Decision Trees and Random Forests

● Neural networks
Unsupervised Learning
● As you might guess, the training data is
unlabeled. The system tries to learn without
a teacher.

● Mathematically
○ Learning patterns in the data without any labels by either minimizing or maximizing the
objective function.

● Use cases
○ Customer segmentation, anomaly detection.
Unsupervised Learning
● Clustering
○ The goal is to find natural groups or clusters
in a feature space and interpret the input data.

○ To divide the data points in a way that each data point falls into a group that is similar
to other data points in the same group based on a predefined similarity or distance
metric in the feature space.

○ Ex: determining customer segments in marketing data.


■ different segments of customers helps marketing teams approach these
customer segments in unique ways.
● Think of features like gender, location, age, education, income bracket, and so on.
Unsupervised Learning
● Dimensionality reduction
○ the goal is to reduce the number of random
variables under consideration.

○ To reduce the complexity of a problem by projecting the feature space to a


lower-dimensional space so that less correlated variables are considered in a
machine learning system.

○ Ex: Visualization algorithms try to preserve as much structure as they can


■ (e.g., trying to keep separate clusters in the input space from overlapping in the
visualization),
○ to understand how the data is organized and perhaps identify unsuspected patterns.
Unsupervised Learning
● Feature extraction
○ The goal is to simplify the data without losing too much
information.

○ One way to do this is to merge several correlated


features into one.

○ Ex: a car’s mileage may be very correlated with its age,


so the dimensionality reduction algorithm will merge them into one feature that
represents the car’s wear and tear.
Unsupervised Learning
● Clustering
○ K-Means
○ Hierarchical Cluster Analysis (HCA)

● Anomaly detection and novelty detection


○ One-class SVM
○ Isolation Forest

● Visualization and dimensionality reduction


○ Principal Component Analysis (PCA)
○ Locally-Linear Embedding (LLE)
Unsupervised Learning
● Deal with partially labeled training data,
usually a lot of unlabeled data and a little
bit of labeled data.

○ Ex: Google Photos-you upload all your family photos, it automatically recognizes that
the same person A shows up in photos 1, 5, and 11, while another person B shows up in
photos 2, 5, and 7.
■ This is the unsupervised part of the algorithm (clustering).

○ Now all the system needs is for you to tell it who these people are.
■ Just one label per person, and it is able to name everyone in every photo, which is
useful for searching photos.
Unsupervised Learning
● The learning system—called an agent
in this context
○ can observe the environment,
○ select and perform actions, and
○ get rewards in return
○ or penalties in the form of negative
rewards.

○ It must then learn by itself what is the best strategy/ policy


to get the most reward over time.

○ A policy defines what action the agent should choose when it is in a given situation.
ML PIPELINE
ML Pipeline
ML Pipeline
ML Pipeline
ML Pipeline
ML Pipeline
● A real estate company wants to predict house prices based on various
factors. They use a machine learning model to help estimate the price of a
house based on its features.

○ Features: These are the characteristics or input variables of each house that are used
to predict the price.
■ Ex: Square footage, number of bedrooms, and age of the house.

○ Labels: This is the target value the model is trying to predict, which in this case is the
house price.
■ Ex: The actual sale price of the house, like $350,000.
ML Pipeline: Predicting House Prices
● A real estate company wants to predict house prices based on various
factors. They use a machine learning model to help estimate the price of a
house based on its features.

○ Training: The process model learns from historical data, where both the features
(house characteristics) and labels (house prices) are known.
■ The company uses past house sales data to train the model so it can learn the
relationship between features and the house price.
ML Pipeline: Predicting House Prices
● A real estate company wants to predict house prices based on various
factors. They use a machine learning model to help estimate the price of a
house based on its features.

○ Testing: The model is tested on unseen data to check how accurately it predicts house
prices for new examples.
■ The model is tested on new houses, where it predicts the price, and the
predictions are compared with the actual prices.
ML Pipeline: Predicting House Prices
● A real estate company wants to predict house prices based on various
factors. They use a machine learning model to help estimate the price of a
house based on its features.

○ Evaluation Metrics: These are measures used to assess how well the model performs.
■ Ex: Mean Squared Error (MSE) can measure how far the predicted house prices
are from the actual prices. Lower error indicates better accuracy.
Challenges in ML
Data Challenges in ML
The two things that can go wrong are “bad algorithm” and “bad data.” Let’s
start with examples of bad data.

● Insufficient Quantity of Training Data


○ ML requires large amounts of data, unlike
a toddler who can quickly learn concepts
with just a few examples, as even simple
ML tasks often need thousands of examples
and complex ones may need millions.
Data Challenges in ML
● Nonrepresentative Training Data
○ To ensure good generalization, training data must be representative of the cases you
want to predict, as missing or biased data can lead to poor model performance.
Data Challenges in ML
● Poor-Quality Data
○ If your training data contains errors, outliers, or noise, it will hinder pattern detection
and reduce system performance, making data cleaning a critical step in building
effective models.
Data Challenges in ML
● Irrelevant Features
○ As the saying goes, "garbage in, garbage out."
○ Your machine learning system's performance depends heavily on having relevant
training data features, making feature engineering—selecting, extracting, and creating
useful features—a crucial aspect of any successful ML project.
Model Challenges in ML
● Overfitting the Training Data
○ Overgeneralization parallels the concept of overfitting in ML, where a model may excel
on training data yet fail to generalize to new data, raising questions about its predictive
trustworthiness.
Model Challenges in ML
● Underfitting the Training Data
○ The opposite of overfitting, occurs when a model is too simplistic to capture the
underlying complexity of the data and can be addressed by selecting a more powerful
model, improving feature engineering, or reducing constraints on the model.
ML Vs. DL
ML Vs. DL
56

Questions?
THANK YOU

You might also like