0% found this document useful (0 votes)

14 views10 pages

Week 3

The document discusses the importance of Python libraries Pandas and Matplotlib for machine learning applications, highlighting their roles in data manipulation, visualization, and model evaluation. Pandas is essential for data cleaning, transformation, and exploration, while Matplotlib is used for creating various visualizations to understand data and evaluate model performance. The document provides examples of using both libraries in a complete machine learning workflow.

Uploaded by

Sahithya Chandana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Week 3

Uploaded by

Sahithya Chandana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 10

Week_3:Study of Python Libraries for ML application such as

Pandas and Matplotlib

 Python is a popular language for machine learning (ML)

applications, and several libraries are frequently used for data
manipulation, visualization, and machine learning tasks. Among
the most widely used libraries for ML applications are Pandas
and Matplotlib.
 These libraries are essential for data preprocessing, exploration,
and visualization.

1. Pandas: Data Manipulation and Analysis

Pandas is a powerful library for data manipulation and analysis. It

provides two key data structures:

 Series: 1-dimensional labeled array.

 DataFrame: 2-dimensional labeled data structure, similar to a
table (rows and columns).

Key Uses of Pandas in ML:

 Data cleaning and preprocessing: Handling missing data,

filtering, and transforming data.
 Data transformation: Applying functions to columns and rows.

 Data exploration: Summarizing and analyzing datasets.

 Merging and joining: Combining multiple datasets.

Example: Pandas for Data Loading, Cleaning, and Preprocessing

import pandas as pd

# Load dataset
df = pd.read_csv('data.csv') # Read a CSV file into a DataFrame

# Show the first few rows of the dataset

print(df.head())

# Check for missing data

print(df.isnull().sum()) # Count the missing values in each column

# Drop rows with missing values

df_clean = df.dropna()

# Fill missing values with the mean of the column

df_filled = df.fillna(df.mean())

# Filter data based on a condition (e.g., values > 50 in a column)

filtered_data = df[df['column_name'] > 50]

# Grouping data by a category

grouped_data = df.groupby('category_column').mean()
# Basic statistics
print(df.describe()) # Summary statistics like mean, std, min, max,
etc.

Key Features in Pandas for ML:

 Handling missing data: fillna(), dropna().

 Aggregation: groupby(), pivot_table().

 Merging and joining: merge(), concat().

 Data transformation: apply(), map(), applymap().

 Data visualization: Integrated plotting with matplotlib.

2. Matplotlib: Data Visualization

Matplotlib is a popular plotting library for creating static, animated,

and interactive visualizations in Python. It is frequently used for
visualizing data and results in machine learning.

Key Uses of Matplotlib in ML:

 Visualizing datasets: Creating various charts such as bar charts,

line plots, scatter plots, histograms, etc.
 Evaluating model performance: Plotting results such as
confusion matrices, ROC curves, or loss and accuracy over
epochs during model training.

Example: Basic Plotting with Matplotlib

import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 10, 100) # 100 points between 0 and 10
y = np.sin(x)

# Line plot
plt.plot(x, y, label='Sine wave', color='b')
plt.title('Sine Wave Example')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

# Scatter plot
x2 = np.random.rand(50)
y2 = np.random.rand(50)
plt.scatter(x2, y2, color='r', alpha=0.7)
plt.title('Random Scatter Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

# Histogram
data = np.random.randn(1000)
plt.hist(data, bins=30, color='green', edgecolor='black')
plt.title('Histogram of Random Data')
plt.show()

Key Features in Matplotlib for ML:

 Line plots: Great for showing trends or relationships.

 Scatter plots: Useful for visualizing relationships between two
variables.

 Bar charts: Ideal for categorical data comparison.

 Histograms: Useful for displaying the distribution of data.

 Subplots: Combine multiple plots in a single figure for

comparison.

 Customization: Control over colors, markers, lines, axes, and

titles.

Example: Visualizing Model Performance (e.g., ROC Curve)

from sklearn.metrics import roc_curve, auc

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Generate a synthetic binary classification dataset

X, y = make_classification(n_samples=1000, n_features=20,
n_classes=2, random_state=42)

# Train a logistic regression model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

# Get predicted probabilities

y_pred_prob = model.predict_proba(X_test)[:, 1]

# Calculate ROC curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

# Plot ROC curve

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area =
%0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

Combining Pandas and Matplotlib in Machine Learning

Workflows

1. Data Exploration with Pandas:

o Load the dataset, clean it, and explore it to understand the
relationships between features.

2. Data Visualization with Matplotlib:

o Visualize the data using plots to better understand

distributions and relationships between features, and to
detect patterns or anomalies.

3. Model Training and Evaluation:

o Use Matplotlib to plot model performance metrics like

accuracy, loss curves, confusion matrices, and ROC
curves, and use Pandas to analyze prediction results (e.g.,
computing precision, recall, etc.).

Example: Complete Workflow with Pandas, Matplotlib, and a

ML Model

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

# Load the dataset (use a built-in dataset like Iris)

df = pd.read_csv('Iris.csv') # Example: Replace with actual dataset
file

# Basic data cleaning and exploration

print(df.head())
print(df.describe())

# Visualizing relationships using scatter plot

plt.scatter(df['sepal_length'], df['sepal_width'],
c=df['species'].apply(lambda x: 0 if x == 'setosa' else 1))
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Sepal Length vs Sepal Width')
plt.show()

# Train a simple model (Logistic Regression)

X = df.drop('species', axis=1) # Features
y = df['species'] # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# Visualizing Confusion Matrix

import seaborn as sns
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=model.classes_, yticklabels=model.classes_)
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

Conclusion
 Pandas is essential for data manipulation, cleaning, and
preprocessing. It allows you to efficiently handle large datasets
and perform operations like filtering, grouping, and merging
data.
 Matplotlib is crucial for data visualization, making it easy to
plot data distributions, trends, and performance metrics, which
is important for both exploratory data analysis and model
evaluation.

Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
PR Final File
No ratings yet
PR Final File
49 pages
(FREE JOB) Home Based Work Without Registration Fees or Investment, Free Online Data Entry Jobs Work From Home, Part Time Typing Jobs
100% (5)
(FREE JOB) Home Based Work Without Registration Fees or Investment, Free Online Data Entry Jobs Work From Home, Part Time Typing Jobs
1 page
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
PR Final File
No ratings yet
PR Final File
70 pages
Notes Applications of ICT
No ratings yet
Notes Applications of ICT
10 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML Record - Merged
No ratings yet
ML Record - Merged
29 pages
Arithmetic Progression Project
No ratings yet
Arithmetic Progression Project
16 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Practical Assignment ML
No ratings yet
Practical Assignment ML
50 pages
Dataanalysis Finals123
No ratings yet
Dataanalysis Finals123
36 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Digital Portable X-Ray Systems: Manual Ver1.7
100% (1)
Digital Portable X-Ray Systems: Manual Ver1.7
47 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
ML Contenthalf
No ratings yet
ML Contenthalf
35 pages
Lab 2 Report
No ratings yet
Lab 2 Report
6 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
ML Lab Manual (Upto Cie-1)
No ratings yet
ML Lab Manual (Upto Cie-1)
33 pages
ML Programs
No ratings yet
ML Programs
41 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
070 1163 04RevB 91496 ServiceManual PDF
No ratings yet
070 1163 04RevB 91496 ServiceManual PDF
78 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Roadmap
No ratings yet
Roadmap
27 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
TESUP ATLAS7 Wind Turbine User Manual
No ratings yet
TESUP ATLAS7 Wind Turbine User Manual
31 pages
Powerpoint Tabs
No ratings yet
Powerpoint Tabs
5 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
ML Lab
No ratings yet
ML Lab
30 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Datascience
No ratings yet
Datascience
26 pages
Micom P443 Programmable Logic: Non - Latching
No ratings yet
Micom P443 Programmable Logic: Non - Latching
20 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
BESM - Cold Hands, Dark Hearts
No ratings yet
BESM - Cold Hands, Dark Hearts
132 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
Ayu Shahirah Salem: Objective
No ratings yet
Ayu Shahirah Salem: Objective
2 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
ML Manual
No ratings yet
ML Manual
21 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
Viva
No ratings yet
Viva
7 pages
AIML Short Term Internship Session 9 Summary-1719044709410
No ratings yet
AIML Short Term Internship Session 9 Summary-1719044709410
14 pages
ML Lab - Manual
No ratings yet
ML Lab - Manual
15 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
ML in Python Part-2
No ratings yet
ML in Python Part-2
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Report
No ratings yet
Report
11 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
اكواد لغة سي.... جاهز نماذج اختبارات...
No ratings yet
اكواد لغة سي.... جاهز نماذج اختبارات...
9 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
External
No ratings yet
External
11 pages
I BCA - CPP Lab
No ratings yet
I BCA - CPP Lab
57 pages
Jeffrey Roy Auth. From Machinery To Mobility Government and Democracy in A Participative Age
No ratings yet
Jeffrey Roy Auth. From Machinery To Mobility Government and Democracy in A Participative Age
137 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
ML Exp
No ratings yet
ML Exp
9 pages
Semenar Report
No ratings yet
Semenar Report
32 pages
Anycubic Kobra Neo 20230109 V0.1.0 English
No ratings yet
Anycubic Kobra Neo 20230109 V0.1.0 English
34 pages
Barrons November 62023
No ratings yet
Barrons November 62023
53 pages
ITN260
No ratings yet
ITN260
7 pages
ML
No ratings yet
ML
8 pages
ML Lab 04 Manual - Pandas and MatplotLib
No ratings yet
ML Lab 04 Manual - Pandas and MatplotLib
7 pages
Ensemble Saas Mano
No ratings yet
Ensemble Saas Mano
4 pages
SolarRiver - 3400TL D 6000TL D Product - Manual V1 2 - EN
No ratings yet
SolarRiver - 3400TL D 6000TL D Product - Manual V1 2 - EN
50 pages
10 Ict Css q3 m1 Css
No ratings yet
10 Ict Css q3 m1 Css
17 pages
Ds You Should Know
No ratings yet
Ds You Should Know
6 pages
1 Goal Programming
No ratings yet
1 Goal Programming
9 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
VIH Series60
100% (2)
VIH Series60
1 page
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Essential Python Libraries and Functions For Data Science 1706295212
No ratings yet
Essential Python Libraries and Functions For Data Science 1706295212
12 pages
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
No ratings yet
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
1 page
DR PPT
No ratings yet
DR PPT
9 pages
Fully Automatic Hot Foil Stamping Machine
No ratings yet
Fully Automatic Hot Foil Stamping Machine
4 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
INTRODUCTION
No ratings yet
INTRODUCTION
5 pages
Deloitte PPT-Devang
No ratings yet
Deloitte PPT-Devang
7 pages
Poster Lifi
No ratings yet
Poster Lifi
1 page
Hybrid Decision Tree-Based Machine Learning Models For Short-Term Water Quality Prediction.
No ratings yet
Hybrid Decision Tree-Based Machine Learning Models For Short-Term Water Quality Prediction.
14 pages
Output Log 2022-04-23 23-38-53
No ratings yet
Output Log 2022-04-23 23-38-53
14 pages

Week 3

Uploaded by

Week 3

Uploaded by

Week_3:Study of Python Libraries for ML application such as

Pandas and Matplotlib

 Python is a popular language for machine learning (ML)

1. Pandas: Data Manipulation and Analysis

Pandas is a powerful library for data manipulation and analysis. It

 Series: 1-dimensional labeled array.

Key Uses of Pandas in ML:

 Data cleaning and preprocessing: Handling missing data,

 Data exploration: Summarizing and analyzing datasets.

 Merging and joining: Combining multiple datasets.

# Show the first few rows of the dataset

# Check for missing data

# Drop rows with missing values

# Fill missing values with the mean of the column

# Filter data based on a condition (e.g., values > 50 in a column)

# Grouping data by a category

Key Features in Pandas for ML:

 Handling missing data: fillna(), dropna().

 Merging and joining: merge(), concat().

 Data transformation: apply(), map(), applymap().

 Data visualization: Integrated plotting with matplotlib.

2. Matplotlib: Data Visualization

Matplotlib is a popular plotting library for creating static, animated,

Key Uses of Matplotlib in ML:

 Visualizing datasets: Creating various charts such as bar charts,

Example: Basic Plotting with Matplotlib

Key Features in Matplotlib for ML:

 Line plots: Great for showing trends or relationships.

 Bar charts: Ideal for categorical data comparison.

 Histograms: Useful for displaying the distribution of data.

 Subplots: Combine multiple plots in a single figure for

 Customization: Control over colors, markers, lines, axes, and

Example: Visualizing Model Performance (e.g., ROC Curve)

from sklearn.metrics import roc_curve, auc

# Generate a synthetic binary classification dataset

# Train a logistic regression model

# Get predicted probabilities

# Calculate ROC curve

# Plot ROC curve

Combining Pandas and Matplotlib in Machine Learning

1. Data Exploration with Pandas:

2. Data Visualization with Matplotlib:

o Visualize the data using plots to better understand

3. Model Training and Evaluation:

o Use Matplotlib to plot model performance metrics like

Example: Complete Workflow with Pandas, Matplotlib, and a

# Load the dataset (use a built-in dataset like Iris)

# Basic data cleaning and exploration

# Visualizing relationships using scatter plot

# Train a simple model (Logistic Regression)

# Visualizing Confusion Matrix

You might also like