ML Lab File
ML Lab File
•
1.4: Seaborn – Statistical Data Visualization
What is Seaborn?
Seaborn is built on top of Matplotlib and provides statistical visualizations with
better aesthetics. It is widely used for data analysis and visual storytelling.
Supports advanced plots like heatmaps.Key Features:
• Supports advanced plots like heatmaps, violin plots, and pair plots.
• Works seamlessly with Pandas DataFrames.
• Includes built-in datasets for practice.
Applications:
• Data science projects.
• Statistical modeling and analysis.
• Correlation analysis (e.g., heatmaps).
•
• Summary Table
Library Purpose
NumPy Numerical computations, array handling
Experiment 2:
Data preprocessing step with implementation:
Data preprocessing in machine learning (ML) is the process of transforming raw
data into a clean and structured format before feeding it into a machine learning
model. It's a crucial step because real-world data is often incomplete,
inconsistent, or noisy.
Why idis it important?
Data preprocessing is essential because it prepares raw data for modeling by
cleaning, transforming, and organizing it. Most algorithms can’t handle missing
values, inconsistent formats, or non-numeric data, so preprocessing ensures the
data is usable. It improves model accuracy by helping the algorithm learn
patterns more effectively and reduces noise and bias for fairer predictions. It
also speeds up training and helps prevent overfitting or underfitting.Good
preprocessing leads to smarter, faster, and more reliable models.
2.2:Data Transformation
1. Encoding categorical data: Converting text labels into numbers
INPUT:
OUTPUT:
EXPERIMENT 3
Basics mathematics functions operation using python:
3.1.Basic Math Operations in Python
These are basic functions like addition, subtraction, multiplication, exponential, etc.
performed using python.
Output:
2. List
A list is a built-in, ordered, and mutable collection that can hold
elements of different types. It supports dynamic resizing and is used
frequently in Python programming.
INPUT:
OUTPUT:
3. Vector
A vector is essentially a 1D array, often implemented using NumPy for
mathematical operations.
It supports efficient numerical computation and broadcasting in Python.
Input:
Output:
4. Matrix
A matrix is a 2D array-like structure used to represent rows and columns of
data. In Python, it’s usually implemented using NumPy for linear algebra and
data manipulation.
Input:
Output:
5. Dictionary
A dictionary is a key-value pair data structure used to store and retrieve data
efficiently. Keys must be unique and immutable, while values can be any data
type.
Input:
Output:
EXPERIMENT 5:
Implement the linear regression model on house price prediction
also calculate the weight and bias using gradient descend
To implement a linear regression model on house price prediction and calculate the weight
and bias using gradient descent, we'll follow these steps:
1. Create a Dataset: For simplicity, let's generate a synthetic dataset where the
independent variable is the number of rooms in a house and the dependent variable is the
house price.
3. Gradient Descent: We will use gradient descent to minimize the cost function (Mean
Squared Error, MSE) and calculate the optimal weight w and bias b.
Input:
Output:
EXPERIMENT 6
Prediction model making confusion matrix using logistic regression
EXPERIMENT 7
On the other hand, K-Means Clustering is an unsupervised algorithm used to group data into clusters based on
similarity. It starts by choosing ‘k’ cluster centers and then repeatedly assigns points to the nearest cluster and
updates the centers. KNN needs labeled data, while K-Means works without labels and helps discover patterns
in data.
EXPERIMENT 9