0% found this document useful (0 votes)
11 views107 pages

Unit 6

Supervised Machine Learning involves training models on labeled data to predict outcomes for new data, focusing on understanding relationships between inputs and outputs. It includes regression for continuous values and classification for categorical labels, utilizing various algorithms like Linear Regression, Decision Trees, and Neural Networks. Key challenges include overfitting, underfitting, and data quality issues, with best practices emphasizing data preprocessing and model evaluation.

Uploaded by

kunalawasthi031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views107 pages

Unit 6

Supervised Machine Learning involves training models on labeled data to predict outcomes for new data, focusing on understanding relationships between inputs and outputs. It includes regression for continuous values and classification for categorical labels, utilizing various algorithms like Linear Regression, Decision Trees, and Neural Networks. Key challenges include overfitting, underfitting, and data quality issues, with best practices emphasizing data preprocessing and model evaluation.

Uploaded by

kunalawasthi031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 107

Supervised Machine Learning

Supervised Machine Learning is a type of machine learning where a model is


trained using labeled data, meaning each input has a corresponding correct
output.

The goal is for the model to learn the relationship between inputs and outputs
so that It can accurately predict outcomes for new, unseen data.
"Supervised Learning is a machine learning paradigm where an algorithm learns
a function that maps an input (X) to an output (Y) based on example input-
output pairs (X, Y), minimizing the error between predicted and actual outputs.“
• Key Characteristics:

✔ Labeled Data – Training data has both inputs and correct outputs.

✔ Prediction Task – Used for classification (categorical labels) and regression


(continuous values).

✔ Model Training – The algorithm learns from examples and generalizes to


unseen data.

✔ Feedback Mechanism – The model’s errors are corrected during training to


improve accuracy.
As a student, understanding Supervised Learning involves grasping:

✔ What it is 🤔

✔ How it works ⚙

✔ Types of problems it solves ✅

✔ Key algorithms 🏆

✔ Challenges and best practices 🚧~


1. What is Supervised Machine Learning?

Supervised learning is like a teacher-student setup where:


The teacher (algorithm) provides feedback on right and wrong answers.
The student (model) learns from examples and improves over time.

For example:
If you train a model with past exam questions and answers, it will learn patterns
to predict answers for future questions.
2. How Does Supervised Learning Work?

11️⃣ Data Collection – Gather labeled data (e.g., images of cats and
dogs with labels).

2️⃣ Preprocessing – Clean and prepare the data for learning


(e.g., handling missing values, normalization).

3️⃣ Training the Model – The model learns patterns from


the labeled dataset.

4️⃣ Testing the Model – The model is evaluated using


unseen test data.
5️⃣ Improvement – Adjust parameters, increase data, or
tune the model to improve accuracy.
3. Types of Supervised Learning Problems

There are two major types:


✅ Regression (Predicting Continuous Values)
Used when predicting numeric values.
Example: Predicting house prices based on area, location, etc.

✅ Classification (Predicting Categories)


Used when predicting categories or labels.
Example: Detecting spam emails (Spam/Not Spam).
Supervised learning approaches can be broadly categorized into Regression and
Classification methods.
1. Regression Algorithms (For Continuous Output Prediction)

Regression models predict a continuous numerical value based on input


features.

Types of Regression:
Linear Regression – Models relationships using a straight line (e.g., predicting
house prices).
Polynomial Regression – Uses polynomial functions to model non-linear
relationships.
3.Ridge and Lasso Regression – Regularized versions of linear regression to
prevent overfitting.

4.Support Vector Regression (SVR) – Uses a margin of tolerance to fit data.

5.Decision Tree Regression – Splits data into smaller regions for prediction.
6.Random Forest Regression – An ensemble of decision trees for better
accuracy.

7.Neural Network Regression – Deep learning-based approach for complex


non-linear patterns.
2. Classification Algorithms (For Categorical Output Prediction)
Classification models predict discrete class labels (e.g., spam detection, disease
classification).

Types of Classification:
Logistic Regression – Uses a sigmoid function to predict probabilities (e.g.,
spam or not spam).
K-Nearest Neighbors (KNN) – Classifies based on the majority of K nearest
neighbors.
Support Vector Machine (SVM) – Finds the best boundary (hyperplane) to
separate classes.
4.Decision Trees – Splits data into branches based on feature values.
5. Random Forest – Uses multiple decision trees to improve accuracy and
reduce overfitting.

6. Naïve Bayes – A probability-based classifier useful for text classification (e.g.,


sentiment analysis).
7. Neural Networks – Deep learning approach used for complex patterns like
image classification.
5. Challenges & Best Practices

🚧 Overfitting – When the model memorizes data instead of generalizing


(solution: regularization, more data).

🚧 Underfitting – When the model is too simple to learn patterns (solution: use a
better model).

🚧 Bias-Variance Tradeoff – Balancing complexity and generalization.


🚧 Data Quality Issues – Handling missing, imbalanced, or noisy data.
Linear Regression:

Linear regression predicts the relationship between two


variables by assuming they have a straight-line connection.
It finds the best line that minimizes the differences between
predicted and actual values.
Used in fields like economics and finance, it helps analyze and
forecast data trends.
 Linear regression can also involve several variables
(multiple linear regression) or be adapted for yes/no questions
(logistic regression).
The linear regression model provides a sloped straight line representing the
relationship between the variables.

Consider the below image:


In machine learning, the line of regression refers to the line that best fits a set
of data points in linear regression analysis.

This line models the relationship between independent variable(s) and a


dependent variable by minimizing the differences between the observed values
and the values predicted by the line.
The equation of a simple linear regression model is:

𝑌=𝑚𝑋+𝑏

Where:
Y = Predicted output (dependent variable)
X = Input feature (independent variable)
m = Slope of the line (coefficient)
b = Y-intercept (bias)
11️⃣House Price Prediction:

💡 Scenario:
Suppose we want to predict the price of a house based on factors like area
(square feet) and the number of bedrooms.

Larger houses and more bedrooms usually increase the price.


House Price Prediction 🏡

X-axis: House Area (sq. ft.), Y-axis: Price ($)


Shows how larger houses tend to have higher prices.
Sales Forecasting 📈
X-axis: Advertising Spend ($), Y-axis: Sales (units)
Demonstrates that more advertising generally leads to higher sales.

Student Performance Prediction 📚


X-axis: Study Hours, Y-axis: Exam Score
Indicates that more study hours result in better scores.
Medical Diagnosis (BMI vs. Diabetes Risk) 🏥
X-axis: BMI, Y-axis: Diabetes Risk Score
Higher BMI values correspond to a higher risk of diabetes.

Stock Price Prediction 💹


X-axis: Past Stock Prices, Y-axis: Predicted Stock Prices
Suggests that stock trends follow a linear pattern over time.
2. Key Concepts in Linear Regression

✅ 2.1 Types of Linear Regression

1️⃣ Simple Linear Regression – One independent variable


(e.g., predicting salary based on years of experience).

2️⃣ Multiple Linear Regression – Multiple independent


variables (e.g., predicting house prices based on size, location, number of
bedrooms).
🔹 Interpretation:

Increasing size increases price by $100 per sq. ft.

More bedrooms add $5,000 per room.

Prime location adds a fixed $20,000 to the price.


Notes:

✅ Simple Linear Regression uses one predictor (e.g., experience → salary).

✅ Multiple Linear Regression uses multiple predictors (e.g., house price → size,
bedrooms, location).

✅ Both help in prediction & decision-making in various industries (finance, real


estate, business, etc.).
Here are the graphs for Simple Linear Regression and Multiple Linear
Regression:

1️⃣ Simple Linear Regression (Salary vs. Years of


Experience):

The blue dots represent actual data points.

The red dashed line is the regression line, showing the trend that salary
increases with experience.
Each point in the 3D space represents a house.

The regression plane fits through the data points, showing how price varies
with both size and number of bedrooms.
2️⃣ Multiple Linear Regression (House Price vs. Size &
Bedrooms):

The blue dots represent actual house data.

The red surface represents the regression plane, showing how price is
influenced by both size and number of bedrooms.
✅ 2.2 Assumptions of Linear Regression

For linear regression to work properly, the following assumptions should hold:

✔ Linearity – The relationship between X and Y is linear.

✔ Independence – Observations are independent of each other.

✔ Homoscedasticity – The variance of errors remains constant.

✔ Normality of Residuals – The errors (residuals) are normally distributed.


3. Advantages & Disadvantages

✅ Advantages
✔ Simple to implement and interpret.
✔ Works well with small to medium datasets.
✔ Computationally efficient.

❌ Disadvantages
🚧 Assumes a linear relationship, which may not always hold.
🚧 Sensitive to outliers.
🚧 Struggles with multicollinearity (when input variables are highly correlated).
4. Applications of Linear Regression

📊 Predicting House Prices – Based on features like size, location, and number of
rooms.

📈 Stock Market Forecasting – Estimating future stock prices using past trends.

💰 Salary Prediction – Predicting an employee’s salary based on experience and


education.

🩺 Medical Research – Estimating disease progression based on patient data.


How Does Linear Regression Work in Python? 🚀

Python provides built-in methods to find relationships between data points


and fit a linear regression line without manually computing mathematical
formulas.

Instead of solving the linear regression equation manually, we use libraries


like NumPy, Matplotlib, and Scikit-Learn to:
✔ Compute the best-fit line
✔ Predict future values
✔ Visualize the relationship
Example: Car Speed vs. Age 🚗
Let’s say we collected data on 13 cars, measuring their age (in years) and their
speed (km/h) as they passed a tollbooth.
Step 1: Import Required Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

#Step 2: Define Data (Car Age & Speed)


# Independent variable (Car Age)
X = np.array([5, 7, 8, 10, 11, 12, 13, 14, 16, 18,
22, 24, 26]).reshape(-1, 1)

# Dependent variable (Car Speed)


Y = np.array([99, 86, 87, 88, 86, 83, 79, 77, 78, 73,
69, 68, 64])
#Step 3: Train the Linear Regression Model
model = LinearRegression()
model.fit(X, Y)

#Step 4: Predict the Speed Based on Car Age


Y_pred = model.predict(X)
#Step 5: Visualize the Linear Regression Line
plt.scatter(X, Y, color='blue', label="Actual Data")
plt.plot(X, Y_pred, color='red', label="Best Fit
Line")
plt.xlabel("Car Age (Years)")
plt.ylabel("Car Speed (km/h)")
plt.legend()
plt.show()
Step 3: Train the Linear Regression Model 🚀

In this step, we train the Linear Regression model using the given dataset (car
age and speed).

Training means the model learns the relationship between the independent
variable (X: Car Age) and the dependent variable (Y: Car Speed) so that it can
make predictions on new data.
How Training Works in Python?
We use the LinearRegression() class from Scikit-Learn to create and train the
model.
Code for Training the Model:
from sklearn.linear_model import LinearRegression

# Create an instance of the Linear Regression model


model = LinearRegression()

# Train (fit) the model using the training data (X and Y)


model.fit(X, Y)
What Happens During Training? 🔍

When we call model.fit(X, Y), the model:

1️⃣Finds the best-fit line for the given data.


2️⃣Calculates the slope (m) and intercept (b) of the regression line:
Y=mX+b
3️⃣Minimizes the error (cost function) using the Least Squares Method, ensuring
that the line is as close as possible to all data points.
Different Types of Lines in Linear Regression 📈

In Linear Regression, various lines can be drawn on a scatter plot based on how
well they fit the data.

Understanding these lines helps in evaluating the performance of a model.


1. Best Fit Line (Regression Line) 🎯

✔ The best fit line (also called the regression line) is the one that minimizes
the error between actual and predicted values.

✔ It is calculated using the Least Squares Method, ensuring the sum of squared
errors is the lowest.
📌 Equation of the Best Fit Line:

Y=mX+b
Where:

m = Slope of the line


b = Intercept
X = Independent variable
Y = Predicted dependent variable
👉 Example: Predicting car speed based on age.

The best fit line shows the trend (older cars tend to move slower).
2. Overfitted Line 🚀

✔ A model that memorizes noise in the training data instead of learning the
general trend.
✔ It has high variance and performs well on training data but poorly on test
data.

📌 Example: A highly complex polynomial curve that fits every data point
perfectly but fails on new data.

✅ Solution: Use regularization techniques (like Ridge or Lasso) or simplify the


model.
3. Underfitted Line 🛑

✔ A model that is too simple to capture the pattern in the data.


✔ It has high bias and performs poorly on both training and test data.

📌 Example: A horizontal line that doesn’t consider variations in data, assuming


no relationship between X and Y.

✅ Solution: Use a more complex model or include more features in the dataset.
4. Residual Line (Error Line) 📉

✔ A residual line represents the difference between the actual value and the
predicted value.
✔ The goal is to minimize the residuals (errors) to get the best fit.

📌 Example: If the actual car speed is 80 km/h but the model predicts 85 km/h,
the residual is 5 km/h.

✅ Smaller residuals → better model performance.


5. Horizontal Line (Mean Line) 📊

✔ A baseline model that predicts the mean of the dependent variable for every
input.
✔ Used for comparison to see if Linear Regression performs better.

📌 Example: If the average car speed is 75 km/h, this line predicts 75 km/h for
every car, ignoring age.

✅ A good regression model should perform better than the mean line.
Regularization in Machine Learning

When training a machine learning model, the model can be


easily overfitted or under fitted.

To avoid this, we use regularization in machine learning to


properly fit the model to our test set.
Regularization is a technique used in machine learning to
prevent overfitting and improve the generalization performance
of models.

In essence, regularization adds a penalty term to the loss


function, discouraging the model from learning overly complex
patterns that may not generalize well to unseen data.

This helps create simpler, more robust models.


The main benefits of regularization include:

• Reducing overfitting: By constraining the model’s complexity,


regularization helps prevent the model from memorizing noise
or irrelevant patterns in the training data.
• Improving generalization: Regularized models tend to perform
better on new, unseen data because they focus on capturing
the underlying patterns rather than fitting the training data
perfectly.
• Enhancing model stability: Regularization makes models less
sensitive to small fluctuations in the training data, leading to
more stable and reliable predictions.
• Enabling feature selection: Some regularization techniques,
such as L1 regularization, can automatically identify and
discard irrelevant features, resulting in more interpretable
models.
The most common regularization techniques are L1
regularization (Lasso), which adds the absolute values of the
model weights to the loss function, and L2 regularization
(Ridge), which adds the squared values of the weights.

By incorporating these penalty terms, regularization strikes a


balance between fitting the training data and keeping the
model simple, ultimately leading to better performance on new
data.
Numerical problem on Linear Regression:

Question 1: Calculate the Regression Line Equation


• Given the dataset:
Find the linear regression equation

𝑚: m is the slope and


Y=mX+c, where

𝑐 :c is the intercept.
Solution:
• Step 1: Calculate the means of X and Y:
Thus, the regression equation is:
• Y=2.2X−1
🚀 Exploratory Data Analysis (EDA)
• Exploratory Data Analysis (EDA) is a crucial step in data analytics that
helps uncover patterns, detect anomalies, and summarize key
characteristics of a dataset before applying machine learning
models. 📊🔍
📌 Why is EDA Important?
• EDA allows data scientists to:
✅ Understand the dataset’s structure and distributions 📊
✅ Detect missing or inconsistent values ❌
✅ Identify correlations and relationships between variables 🔗
Prepare
🛠️ data for further analysis
🔍 Looking at & Cleaning Data
• Before analyzing data, we must clean and preprocess it:
✔️Handle missing values (drop, impute, or fill) ❓
✔️Detect and remove duplicate values 📑
✔️Identify outliers and correct them 📉
✔️Standardize data formats (e.g., dates, categories)
• import pandas as pd

• # Load dataset
• df = pd.read_csv("data.csv")

• # Check for missing values


• print(df.isnull().sum())

• # Fill missing values


• df.fillna(df.mean(), inplace=True)
📊 Summarizing Data with Statistics & Visualizations
• EDA involves summarizing data using descriptive statistics like:
✔️Mean, Median, Mode
✔️Standard deviation, Variance 📏
✔️Min, Max, Quartiles 📐
• # Summary statistics
• print(df.describe())

Visualization Tools
• 🔹 Bar Plots – Compare categories 📊
🔹 Histograms – Show data distribution 📈
🔹 Box Plots – Detect outliers 🎁
🔹 Pair Plots – Identify relationships 🔗
• import seaborn as sns
• import matplotlib.pyplot as plt

• # Histogram of a variable
• sns.histplot(df["column_name"], bins=20)
• plt.show()
❓ Asking & Answering Data Questions
• To extract deeper insights, ask questions like:
🔍 What is the average salary of employees in different departments? 💰
🔍 Which product has the highest sales over time? 📆
🔍 Is there a correlation between age and spending habits? 🤝
• # Group data by category
• df.groupby("Department")["Salary"].mean()

🐍 Using Python Libraries for EDA


• Python provides powerful libraries for EDA:
🟢 pandas – Data manipulation & analysis
🔵 NumPy – Numerical computations
🔴 Matplotlib – Basic visualizations
🟡 Seaborn – Advanced statistical plots
• import pandas as pd
• import numpy as np
• import matplotlib.pyplot as plt
• import seaborn as sns

• # Load dataset
• df = pd.read_csv("data.csv")

• # Quick overview
• print(df.head())

• # Correlation heatmap
• plt.figure(figsize=(8,6))
• sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
• plt.show()
🐍 Seaborn and Scikit-Learn: A Quick Guide
📊 Seaborn: Advanced Data Visualization Library
Seaborn is a Python library built on top of Matplotlib that makes
statistical data visualization easy and attractive. It is widely used for
data exploration and insight generation.
🔹 Features of Seaborn
• ✅ Built-in themes for better aesthetics 🎨
✅ Works well with pandas DataFrames 📑
✅ Supports statistical visualization (e.g., regression plots, pair plots) 📊
✅ Integrates well with Matplotlib 📈
• Common Seaborn Plots

• import seaborn as sns


• import matplotlib.pyplot as plt

• # Load an example dataset


• df = sns.load_dataset("tips")

• # Bar plot
• sns.barplot(x="day", y="total_bill", data=df, palette="coolwarm")
• plt.show()
🔹 Other Visualizations:
Histogram: sns.histplot(df["column"])
Box Plot (Outliers): sns.boxplot(x="category", y="value", data=df)
• Heatmap (Correlations): sns.heatmap(df.corr(), annot=True,
cmap="coolwarm")
🤖 Scikit-Learn: Machine Learning Library
Scikit-Learn (sklearn) is a powerful Python library for machine learning,
data preprocessing, and model evaluation.
🔹 Features of Scikit-Learn
• ✅ Supports supervised & unsupervised learning models 🔍
✅ Provides data preprocessing tools (e.g., scaling, encoding) ⚙️
✅ Implements various ML models (Regression, Classification,
Clustering) 🤖
✅ Offers performance evaluation metrics 📊
• 🔹 Example: Lifrom sklearn.model_selection import train_test_split
• from sklearn.linear_model import LinearRegression
• from sklearn.metrics import mean_squared_error

• # Sample dataset
• import pandas as pd
• df = pd.DataFrame({'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 5, 4, 5]})

• # Split data
• X = df[['X']]
• y = df['Y']
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

• # Train model
• model = LinearRegression()
• model.fit(X_train, y_train)

• # Predict & evaluate


• y_pred = model.predict(X_test)
• print("Mean Squared Error:", mean_squared_error(y_test, y_pred)) near Regression with Scikit-Learn
🔄 Seaborn vs. Scikit-Learn

Feature Seaborn 📊 Scikit-Learn 🤖


Purpose Data Visualization Machine Learning
Main Use Statistical Plots Model Training & Prediction
Built On Matplotlib NumPy, SciPy, Pandas
Regression, Classification,
Examples Histograms, Heatmaps, Pair Plots
Clustering
Libraries Used pandas, Matplotlib NumPy, SciPy, pandas
🚀 Summary
• 🔹 Seaborn is great for visualizing data before applying ML models.
🔹 Scikit-Learn is a machine learning toolkit for training & evaluating
models.
1. Load the Flights dataset and create a pivot table showing the number
of passengers for each month and year, then visualize it.

2. Generate a pie chart using Matplotlib for the number of meals ordered
on each day in the Tips dataset. (Hint: Use df['day'].value_counts())

3. Use Seaborn to create a heatmap showing missing values (if any) in the
Titanic dataset. (Hint: Use sns.heatmap(df.isnull(), cmap='coolwarm'))
🚀 End-to-End Example: Using Seaborn & Scikit-Learn for Predicting
House Prices
• We'll perform Exploratory Data Analysis (EDA) using Seaborn and
build a Linear Regression Model using Scikit-Learn to predict house
prices. 🏠📊🤖
• 📌 Step 1: Import Libraries

• import pandas as pd
• import numpy as np
• import seaborn as sns
• import matplotlib.pyplot as plt
• from sklearn.model_selection import train_test_split
• from sklearn.linear_model import LinearRegression
• from sklearn.metrics import mean_squared_error, r2_score
📌 Step 2: Load Dataset
• We'll use a sample dataset of house prices.
• # Sample dataset (features: size, bedrooms, age, price)
• data = {
• "Size": [1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700],
• "Bedrooms": [3, 3, 3, 4, 2, 3, 4, 4, 2, 3],
• "Age": [20, 15, 18, 25, 10, 22, 30, 8, 12, 18],
• "Price": [245000, 312000, 279000, 308000, 199000, 219000, 405000, 450000,
215000, 310000]
•}

• df = pd.DataFrame(data)

• # Display first few rows


• print(df.head())
📌 Step 3: Exploratory Data Analysis (EDA) using Seaborn
• 1️⃣Check Correlation (Heatmap)

• plt.figure(figsize=(8, 6))
• sns.heatmap(df.corr(), annot=True, cmap="coolwarm", fmt=".2f")
• plt.title("Feature Correlation Heatmap")
• plt.show()

• 💡 Insight: This heatmap shows relationships between variables. Size


and Price have a strong positive correlation! 📈
• 2️⃣Pair Plot for Feature Relationships
• sns.pairplot(df)
• plt.show()

• 💡 Insight: Scatter plots help us see trends, like larger houses costing
more. 🏠💲
• 3️⃣Box Plot to Detect Outliers

• sns.boxplot(data=df, palette="Set2")
• plt.title("Box Plot for Outlier Detection")
• plt.show()

• 💡 Insight: If we see extreme values, we might need to remove or


transform them. 🚀
📌 Step 4: Train Machine Learning Model (Scikit-Learn)
• 1️⃣Define Features & Target

• X = df[['Size', 'Bedrooms', 'Age']] # Features


• y = df['Price'] # Target variable (house price)
• 2️⃣Split Data into Training & Testing Sets

• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

3 Train Linear Regression Model


• 3️⃣
• model = LinearRegression()
• model.fit(X_train, y_train)

• 4️⃣Predict House Prices


• y_pred = model.predict(X_test)
• 5️⃣Evaluate Model Performance

• mse = mean_squared_error(y_test, y_pred)


• r2 = r2_score(y_test, y_pred)

• print("Mean Squared Error:", mse)


• print("R-squared Score:", r2)

𝑅2 mean a better model. 📉✅


💡 Insight: Lower MSE and higher
• 📌 Step 5: Visualizing Predictions vs. Actual Prices

• plt.figure(figsize=(8,5))
• sns.scatterplot(x=y_test, y=y_pred, color="blue", label="Predicted vs
Actual")
• plt.xlabel("Actual Prices")
• plt.ylabel("Predicted Prices")
• plt.title("Actual vs Predicted House Prices")
• plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],
color="red", linestyle="--")
• plt.legend()
• plt.show()
• 💡 Insight: The closer points are to the red diagonal line, the better our
predictions. 🎯

🎯 Final Summary
• ✅ Seaborn helped visualize relationships between house features and
price. 📊
✅ Scikit-Learn built a Linear Regression model to predict house prices.
🤖
✅ The correlation heatmap & scatter plots helped us choose features
for the model. 🔍
✅ The model was evaluated using MSE & R2R^2R2, and its
performance was visualized. 🚀
1. Generate a pie chart using Matplotlib for the number of meals ordered daily in
the Tips dataset. (Hint: Use df['day'].value_counts())

• 2. Create a bar plot to show the average tip for each day using the Tips dataset.

• 3. Create a customized visualization of the correlation matrix for the diamonds


dataset, with annotations and a specific color palette.

You might also like