0% found this document useful (0 votes)
14 views37 pages

Ds Lect 14

The document provides an introduction to supervised learning in data science, focusing on machine learning algorithms that utilize labeled data for training, such as regression and classification. It distinguishes between continuous data, which is used in regression problems, and non-continuous (categorical) data, which is used in classification problems, detailing various algorithms applicable to each type. Key algorithms discussed include Linear Regression, Decision Trees, and Support Vector Regression for continuous data, and Decision Trees for classification tasks.

Uploaded by

faryalshahid808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Ds Lect 14

The document provides an introduction to supervised learning in data science, focusing on machine learning algorithms that utilize labeled data for training, such as regression and classification. It distinguishes between continuous data, which is used in regression problems, and non-continuous (categorical) data, which is used in classification problems, detailing various algorithms applicable to each type. Key algorithms discussed include Linear Regression, Decision Trees, and Support Vector Regression for continuous data, and Decision Trees for classification tasks.

Uploaded by

faryalshahid808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduction to Data Science

Supervised Learning

Overview

Instructor:
Rabia Tariq
Lecturer, Department of Computer Science
Email: [email protected]
Course Topics

 Supervised Machine Learning (Clustering)

 Basic Machine Learning Algorithms

 Regression Algorithm

 Classification Algorithm
•Machine Learning algorithms are computational methods that allow systems to learn patterns from data and make
predictions or decisions without being explicitly programmed.

•These algorithms can be categorized into


•supervised learning
•unsupervised learning
•reinforcement learning.

Unsupervised Learning: Identifies patterns in unlabeled data (e.g., clustering, dimensionality reduction).

Supervised Learning: Uses labeled data for training (e.g., regression, classification).

Reinforcement Learning: Learns from actions and rewards (e.g., game-playing AI, robotics).
Continuous Data

•Continuous data consists of numerical values that can take any value within a range.
•These values are measurable and often represented as real numbers.
•Used in Regression problems.
Example: Predicting house prices based on area and location.
Algorithm Example: Linear Regression, Decision Trees (for regression).

Non-Continuous (Categorical) Data


Non-continuous data consists of discrete values, often representing categories or labels.
Used in Classification problems.
Example: Predicting whether an email is spam or not (Spam/Not Spam).
Algorithm Example: Naive Bayes, Decision Trees (for classification), k-NN.
Algorithms That Work on Continuous Data
1. Linear Regression
•Predicts a continuous output based on independent variables.
•Used in trend analysis, price forecasting, and risk assessment.
2. Polynomial Regression
•A variant of linear regression that models a nonlinear relationship using polynomial terms.
•Used for financial market trends, disease progression modeling.
3. Decision Trees (for Regression)
•Splits data into subsets based on feature values to predict continuous outputs.
•Used in sales forecasting and weather prediction.
4.Support Vector Regression (SVR)
SVR is an extension of Support Vector Machines (SVM) for regression tasks. Instead of finding a classification boundary, it
finds a hyperplane that best fits the data while maintaining a margin.
Concept: Unlike linear regression, SVR allows for flexibility using an epsilon-insensitive loss function, ignoring errors
within a certain threshold.
Example: Stock price prediction.
•Used in financial time series forecasting.
5. Neural Networks (Regression-Based Architectures)
•Deep learning models that can capture complex relationships in continuous data.
•Used in image processing (e.g., predicting pixel intensities), climate modeling, and speech recognition.
6. K-Nearest Neighbors (for Regression - k-NN Regression)
•Predicts a continuous value based on the average (or weighted average) of its k-nearest
neighbors.
•Used in real estate price prediction, recommendation systems.
Example 1 Example 1
Algorithms That Work on Non-Continuous (Categorical) Data in Supervised Learning

•Non-continuous data, also known as categorical or discrete data, consists of distinct labels or categories.
•Algorithms used for this type of data generally fall under classification models.
1. Decision Tree (Classification)

Decision Trees are used for both regression and classification, but in the case of categorical data, they help classify
instances into discrete labels.

How It Works?
The dataset is split based on feature values using impurity measures like Gini Index or Entropy (used in ID3 Algorithm).
Nodes represent features, and branches represent decisions leading to final class labels at the leaf nodes.
Example: Classifying Emails as Spam or Not Spam
Feature: Contains “discount” in the subject line? → Yes/No
If Yes → More likely Spam
If No → Move to next decision (e.g., Contains attachment? Yes/No)
Comparison of Algorithms for Continuous vs. Non-Continuous Data
End

You might also like