0% found this document useful (0 votes)
3 views33 pages

ML Lab File

The document outlines a course on Basics of Machine Learning and Applications, detailing various Python libraries essential for data science, such as NumPy, Pandas, and Scikit-learn. It includes experiments on data preprocessing, mathematical operations, data structures, and implementing machine learning models like linear regression and KNN. Additionally, it emphasizes the importance of data cleaning and transformation in preparing datasets for machine learning.

Uploaded by

muskansonisdgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views33 pages

ML Lab File

The document outlines a course on Basics of Machine Learning and Applications, detailing various Python libraries essential for data science, such as NumPy, Pandas, and Scikit-learn. It includes experiments on data preprocessing, mathematical operations, data structures, and implementing machine learning models like linear regression and KNN. Additionally, it emphasizes the importance of data cleaning and transformation in preparing datasets for machine learning.

Uploaded by

muskansonisdgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

DEPARTMENT OF COMPUTER

SCIENCE AND ENGINEERING


Basics Of Machine Learning and
Applications CS 106

SUBMITTED TO: Mr. Abhishek Sir


SUBMITTED BY: MUSKAN SONI
(24/A04/071)
Table of Contents
Serial Experiments Name Date Page No.
No.
1 Write down 10 basic library for 01/01/2025 1-
python implementation
2 Data Preprocessing step by step 03/01/2025 5
implementation
3 Control Flow Statements 05/01/2025 10
4 Functions in Python 07/01/2025 15
5 File Handling in Python 09/01/2025 20
6 NumPy Basics 11/01/2025 25
7 Pandas for Data Analysis 13/01/2025 30
8 Data Visualization 15/01/2025 35
9 Web Scraping with BS4 17/01/2025 40
10 Machine Learning Basics 19/01/2025 45
Experiment 1:
Write down 10 basic library for python
implementation:
1.1: NumPy – Numerical Computations
What is NumPy?
NumPy (Numerical Python) is a fundamental library for numerical computing in
Python. It provides support for handling large multi-dimensional arrays and
matrices, along with mathematical functions to operate on these arrays.
Key Features:
• A.) Provides high-performance multidimensional arrays.
• B.) Supports mathematical and statistical operations.
• C.) More efficient than Python lists (uses less memory).
Applications:
• A.) Scientific computing.
• B.) Data analysis and machine learning.
• C.) Image and signal processing.
1.2. Pandas – Data Manipulation & Analysis
What is Pandas?
Pandas is a powerful library used for data manipulation and analysis. It provides
two main data structures:
Series (1D labeled array)
DataFrame (2D labeled table similar to a spreadsheet)
Key Features:
• Handles structured data efficiently (CSV, Excel, databases).
• Supports filtering, grouping, merging, and statistical analysis.
• Works well with NumPy and Matplotlib.
Applications:
• Data preprocessing in machine learning.
• Financial and economic data analysis.
• Handling missing data in datasets.

1.3: Matplotlib – Data Visualization
What is Matplotlib?
Matplotlib is a Python library used to create static, animated, and
interactive plots. It allows users to generate bar charts, line graphs,
scatter plots, histograms, and more.
Key Features:
• Highly customizable visualization library.
• Can generate plots in multiple formats (PNG, PDF, SVG).
• Works well with NumPy and Pandas.
Applications:
• Data visualization in research and analysis.
• Exploratory Data Analysis (EDA) in machine learning.
• Plotting real-time sensor data.


1.4: Seaborn – Statistical Data Visualization
What is Seaborn?
Seaborn is built on top of Matplotlib and provides statistical visualizations with
better aesthetics. It is widely used for data analysis and visual storytelling.
Supports advanced plots like heatmaps.Key Features:
• Supports advanced plots like heatmaps, violin plots, and pair plots.
• Works seamlessly with Pandas DataFrames.
• Includes built-in datasets for practice.
Applications:
• Data science projects.
• Statistical modeling and analysis.
• Correlation analysis (e.g., heatmaps).

1.5: Scikit-learn – Machine Learning


What is Scikit-learn?
Scikit-learn is a popular Python library for machine learning and data mining. It
provides simple and efficient tools for classification, regression, clustering, and
dimensionality reduction.
Key Features:
• Built-in datasets for testing ML models.
• Supports supervised and unsupervised learning.
• Works well with NumPy, Pandas, and Matplotlib.
Applications:
• Predictive modeling (e.g., price predictions, medical diagnosis).
• Text classification (e.g., spam detection).
• Image recognition and face detection.

1.6: TensorFlow – Deep Learning & AI


What is TensorFlow?
TensorFlow is an open-source deep learning framework developed by Google. It
is used for building and training neural networks in AI applications.
Key Features:
• Supports both CPU and GPU acceleration for fast computations.
• Provides tools for building neural networks (ANN, CNN, RNN).
• Works well with large datasets and complex models.
Applications:
• Image and speech recognition.
• Self-driving cars (object detection).
• Natural Language Processing (NLP) in chatbots and translators.

1.7: Requests – HTTP Requests (Fetching Data from APIs)


What is Requests?
Requests is a library used to send HTTP requests in Python. It allows users to
fetch data from web services and APIs (Application Programming Interfaces).
Key Features:
• Supports GET, POST, PUT, DELETE requests.
• Handles authentication, cookies, and sessions.
• Can be used to scrape websites and fetch live data.
Applications:
• Fetching weather data from APIs.
• Automating web interactions.
• Downloading content from websites.

1.8 : BeautifulSoup – Web Scraping


What is BeautifulSoup?
BeautifulSoup is a Python library used for web scraping. It helps extract specific
data from HTML and XML files.
Key Features:
• Parses HTML and XML files.
• Extracts data from web pages easily.
• Works well with Requests for scraping live websites.
Applications:
• Scraping job listings from websites.
• Extracting data from Wikipedia.
• Automating data collection from the web.
1.9: Flask – Web Development
What is Flask?
Flask is a lightweight web framework used to develop web applications in
Python. It is simple, scalable, and widely used for API development.
Key Features:
• Provides an easy way to create RESTful APIs.
• Has built-in support for routing and database connections.
• Lightweight and easy to integrate with front-end frameworks.
Applications:
• Developing web-based machine learning applications.
• Creating REST APIs for mobile apps.
• Backend for small-scale web applications.
1.10: OpenCV – Image Processing & Computer Vision
What is OpenCV?
OpenCV (Open Source Computer Vision) is a library used for image processing
and computer vision tasks. It can handle image recognition, object detection,
and video analysis.
Key Features:
• Supports face detection, edge detection, and image transformations.
• Works with real-time video processing.
• Can integrate with deep learning models.
Applications:
• Face recognition in security systems.
• Object detection in autonomous vehicles.
• Barcode and QR code scanning.


• Summary Table
Library Purpose
NumPy Numerical computations, array handling

Pandas Data manipulation and analysis

Matplotlib Data visualization (charts, graphs)

Seaborn Statistical data visualization

Scikit-learn Machine learning models

TensorFlow Deep learning and AI

Requests Fetching data from web APIs

BeautifulSoup Web scraping and data extraction

Flask Web development (APIs, backend)

OpenCV Image processing and computer vision

Experiment 2:
Data preprocessing step with implementation:
Data preprocessing in machine learning (ML) is the process of transforming raw
data into a clean and structured format before feeding it into a machine learning
model. It's a crucial step because real-world data is often incomplete,
inconsistent, or noisy.
Why idis it important?
Data preprocessing is essential because it prepares raw data for modeling by
cleaning, transforming, and organizing it. Most algorithms can’t handle missing
values, inconsistent formats, or non-numeric data, so preprocessing ensures the
data is usable. It improves model accuracy by helping the algorithm learn
patterns more effectively and reduces noise and bias for fairer predictions. It
also speeds up training and helps prevent overfitting or underfitting.Good
preprocessing leads to smarter, faster, and more reliable models.

Steps in Data Preprocessing:


2.1: Data Cleaning
1. Handling missing values by filling in with mean/median, or dropping
2. Removing duplicates

3. Fixing inconsistencies (e.g., typos or formatting issues)

2.2:Data Transformation
1. Encoding categorical data: Converting text labels into numbers

2. Log transformations or Box-Cox: Making data more normally distributed if


needed

2.3:Feature Selection / Extraction


1. Selecting the most relevant variables (features) for the model
2. Creating new features (feature engineering

2.4: Data Splitting


1. Dividing the dataset into training, validation, and test sets.

INPUT:
OUTPUT:

EXPERIMENT 3
Basics mathematics functions operation using python:
3.1.Basic Math Operations in Python

These are basic functions like addition, subtraction, multiplication, exponential, etc.
performed using python.

3.1.1: Using Functions for Math

3.1.2: Using the math Module


3.2 More Complex Mathematical Functions in Python

3.2.1 Logarithmic and Exponential Functions

3.2.2 Factorials and Combinations


4. Rounding, Floor, Ceil & Modf

5. Complex Numbers (cmath module)


6. Using NumPy for Arrays of Math
EXPERIMENT 4:
To implement the python data structure Array, list, vector,
matrix, dictionary with basic operation in python
1.Array:
An array is a fixed-type collection of elements stored in a contiguous memory
location.In Python, it's created using the array module and holds data of the
same type.
Input:

Output:

2. List
A list is a built-in, ordered, and mutable collection that can hold
elements of different types. It supports dynamic resizing and is used
frequently in Python programming.
INPUT:

OUTPUT:

3. Vector
A vector is essentially a 1D array, often implemented using NumPy for
mathematical operations.
It supports efficient numerical computation and broadcasting in Python.

Input:
Output:

4. Matrix
A matrix is a 2D array-like structure used to represent rows and columns of
data. In Python, it’s usually implemented using NumPy for linear algebra and
data manipulation.

Input:
Output:
5. Dictionary
A dictionary is a key-value pair data structure used to store and retrieve data
efficiently. Keys must be unique and immutable, while values can be any data
type.
Input:

Output:
EXPERIMENT 5:
Implement the linear regression model on house price prediction
also calculate the weight and bias using gradient descend
To implement a linear regression model on house price prediction and calculate the weight
and bias using gradient descent, we'll follow these steps:

1. Load a Dataset: Load the given dataset or create synthetic dataset.


To implement a linear regression model on house price prediction and calculate the weight
and bias using gradient descent, we'll follow these steps:

1. Create a Dataset: For simplicity, let's generate a synthetic dataset where the
independent variable is the number of rooms in a house and the dependent variable is the
house price.

2. Linear Regression Model: The model equation will be y = w * x + b, where y is the


house price, x is the number of rooms, w is the weight (slope), and b is the bias (intercept).

3. Gradient Descent: We will use gradient descent to minimize the cost function (Mean
Squared Error, MSE) and calculate the optimal weight w and bias b.
Input:
Output:
EXPERIMENT 6
Prediction model making confusion matrix using logistic regression
EXPERIMENT 7

Implement optimised model on given dataset.


EXPERIMENT 8

Implement KNN and K-Means Clustering


K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification. It works by finding the ‘k’
closest data points (neighbors) to a new point and assigns the most common label among them. It’s simple and
effective, especially for small datasets.

On the other hand, K-Means Clustering is an unsupervised algorithm used to group data into clusters based on
similarity. It starts by choosing ‘k’ cluster centers and then repeatedly assigns points to the nearest cluster and
updates the centers. KNN needs labeled data, while K-Means works without labels and helps discover patterns
in data.
EXPERIMENT 9

Use matplotlib and seaborn to visualize relationships in a dataset.

You might also like