0% found this document useful (0 votes)
2 views25 pages

Machine Learning Using Phython

The document provides a comprehensive overview of NumPy and Pandas, two essential libraries for numerical data handling and data manipulation in Python. It covers array creation, indexing, reshaping, and mathematical operations in NumPy, as well as the structure and functionalities of Pandas Series and DataFrames. Additionally, it introduces data visualization techniques using Matplotlib and Seaborn, highlighting their importance in analyzing and presenting data.

Uploaded by

sanjufaujdar88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views25 pages

Machine Learning Using Phython

The document provides a comprehensive overview of NumPy and Pandas, two essential libraries for numerical data handling and data manipulation in Python. It covers array creation, indexing, reshaping, and mathematical operations in NumPy, as well as the structure and functionalities of Pandas Series and DataFrames. Additionally, it introduces data visualization techniques using Matplotlib and Seaborn, highlighting their importance in analyzing and presenting data.

Uploaded by

sanjufaujdar88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

1.

Creation of NumPy Arrays

NumPy Arrays are powerful structures for handling numerical data efficiently. Unlike Python
lists, arrays offer faster computation and are used widely in data science and machine learning.

🔹 Importing NumPy

python
CopyEdit
import numpy as np

🔹 Types of Array Creation

Method Description Example


np.array() Converts Python list/tuple to NumPy array np.array([1, 2, 3])
np.zeros() Creates an array filled with zeros np.zeros((2, 3))
np.ones() Creates an array filled with ones np.ones((2, 2))
np.arange() Creates an array with a range of values np.arange(0, 10, 2)
np.linspace() Creates evenly spaced values in a range np.linspace(0, 1, 5)
np.eye() Creates an identity matrix np.eye(3)
np.random.rand() Random values between 0 and 1 np.random.rand(2, 3)

🔹 Example

python
CopyEdit
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a)

Output:

lua
CopyEdit
[[1 2 3]
[4 5 6]]

🔷 2. Array Indexing in NumPy

Indexing is used to access or modify specific elements or rows/columns in an array.

🔹 1D Array Indexing

python
CopyEdit
arr = np.array([10, 20, 30, 40])
print(arr[1]) # Output: 20
🔹 2D Array Indexing

python
CopyEdit
a = np.array([[1, 2, 3],
[4, 5, 6]])

print(a[0][1]) # Output: 2 (row 0, column 1)

Alternatively:

python
CopyEdit
print(a[0, 1]) # Same result

🔹 Negative Indexing

Access elements from the end.

python
CopyEdit
arr = np.array([10, 20, 30])
print(arr[-1]) # Output: 30

🔹 Slicing

python
CopyEdit
a = np.array([10, 20, 30, 40, 50])
print(a[1:4]) # Output: [20 30 40]

🔹 Indexing a 2D Array (Rows and Columns)

python
CopyEdit
a = np.array([[10, 20, 30],
[40, 50, 60]])

print(a[:, 1]) # All rows, 2nd column → [20 50]


print(a[1, :]) # 2nd row, all columns → [40 50 60]

📌 Why NumPy Arrays Are Important

 Faster than Python lists


 Support vectorized operations
 Used in ML for data storage, preprocessing, and feeding data into m

NumPy: Reshaping Arrays and Array Math & Assignment


🔷 1. Reshaping Arrays

Reshaping means changing the shape (i.e., number of rows and columns) of an array without
changing its data.

🔹 Function Used:

python
CopyEdit
array.reshape(new_shape)

🔹 Rules:

 Total number of elements must remain the same.


 Can convert between 1D, 2D, and 3D arrays.

🔹 Examples:

1D to 2D:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((2, 3))
print(b)

Output:

CopyEdit
[[1 2 3]
[4 5 6]]

2D to 1D:

python
CopyEdit
a = np.array([[1, 2, 3], [4, 5, 6]])
b = a.reshape(-1)
print(b) # Output: [1 2 3 4 5 6]

Using -1 (Auto-calculate dimension):

python
CopyEdit
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((3, -1)) # Automatically calculates column size
print(b)

Output:
CopyEdit
[[1 2]
[3 4]
[5 6]]

🔷 2. Array Math and Assignment

NumPy allows element-wise operations and broadcasting, making it very efficient for
mathematical calculations.

🔹 Basic Math Operations (Element-wise)


python
CopyEdit
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b) # [5 7 9]
print(a - b) # [-3 -3 -3]
print(a * b) # [4 10 18]
print(a / b) # [0.25 0.4 0.5 ]

🔹 Mathematical Functions
python
CopyEdit
np.sum(a) # Sum of all elements
np.mean(a) # Average
np.max(a) # Maximum
np.min(a) # Minimum
np.sqrt(a) # Square root
np.exp(a) # Exponentials
np.log(a) # Logarithm

🔹 Broadcasting

Allows arithmetic between arrays of different shapes.

Example:

python
CopyEdit
a = np.array([[1, 2], [3, 4]])
b = np.array([10, 20])
print(a + b)
Output:

CopyEdit
[[11 22]
[13 24]]

🔹 Assignment (Changing Values)

You can assign new values to elements or slices of an array.

python
CopyEdit
a = np.array([10, 20, 30, 40])
a[0] = 99 # Change first element
a[1:3] = [55, 66] # Change second and third
print(a) # Output: [99 55 66 40]

In 2D:

python
CopyEdit
b = np.array([[1, 2], [3, 4]])
b[1][0] = 100
print(b)

📌 Summary

Concept Description

reshape() Change array shape without changing data

Array Math Perform element-wise operations

Broadcasting Auto-align smaller array to match dimensions

Assignment Modify values of elements directly

Manipulating Tabular Data using Pandas

Pandas is a powerful Python library used for data analysis and manipulation. It provides two
main data structures:

🔷 1. Pandas Series
✅ What is a Series?

 A Series is a one-dimensional labeled array.


 Can hold any data type: integers, floats, strings, etc.
 Think of it like a column in Excel.

🔹 Creating a Series:

python
CopyEdit
import pandas as pd

s = pd.Series([10, 20, 30, 40])


print(s)

Output:

go
CopyEdit
0 10
1 20
2 30
3 40
dtype: int64

 The left side is the index.


 The right side is the data values.

🔹 Custom Index:

python
CopyEdit
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

🔹 Accessing Elements:

python
CopyEdit
s['a'] # Output: 10
s[0] # Output: 10

🔹 Operations:

python
CopyEdit
s + 5 # Adds 5 to each element
s.mean() # Average
s.max(), s.min()
🔷 2. Pandas DataFrame

✅ What is a DataFrame?

 A DataFrame is a two-dimensional table (like a spreadsheet).


 Contains rows and columns, each with labels.
 Built from lists, dictionaries, Series, or NumPy arrays.

🔹 Creating a DataFrame:
python
CopyEdit
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Score': [85, 90, 95]}

df = pd.DataFrame(data)
print(df)

Output:

markdown
CopyEdit
Name Age Score
0 Alice 25 85
1 Bob 30 90
2 Charlie 35 95

🔷 3. Manipulating DataFrames

🔹 Accessing Columns:

python
CopyEdit
df['Name'] # Returns Series
df[['Name', 'Age']] # Multiple columns

🔹 Accessing Rows:

python
CopyEdit
df.loc[0] # Row by label
df.iloc[1] # Row by position

🔹 Adding a New Column:

python
CopyEdit
df['Grade'] = ['B', 'A', 'A+']

🔹 Filtering Rows (Conditional Selection):

python
CopyEdit
df[df['Age'] > 28]

🔹 Updating Values:

python
CopyEdit
df.at[0, 'Score'] = 88

🔹 Dropping Columns or Rows:

python
CopyEdit
df.drop('Age', axis=1) # Drop column
df.drop(1, axis=0) # Drop row by index

🔹 Sorting:

python
CopyEdit
df.sort_values(by='Score', ascending=False)

🔹 Grouping:

python
CopyEdit
df.groupby('Grade').mean()

🔹 Missing Values:

python
CopyEdit
df.isnull() # Check missing
df.dropna() # Drop rows with missing
df.fillna(0) # Fill missing with 0

✅ Summary Table:

Feature Series DataFrame

Structure 1D 2D (rows and columns)

Access Index only Index and Column


Feature Series DataFrame

Creation pd.Series() pd.DataFrame()

Example Use Single column Complete table

✅ Why Use Pandas for Tabular Data?

 Easy to load CSV/Excel files


 Clean and preprocess large datasets
 Group, summarize, filter, and sort

Integrates with Matplotlib and Seaborn for visualizationPandas Series


and DataFrame

Pandas is a Python library used for working with structured (tabular) data. It provides two
main data structures:

🔷 1. Pandas Series

✅ What is a Series?

 A Series is a one-dimensional labeled array.


 It can hold data like integers, strings, floats, etc.
 Think of it like a single column from an Excel sheet.

✅ Key Points:

 Has data values and index labels


 Built on top of NumPy arrays

🔹 Example:
python
CopyEdit
import pandas as pd

s = pd.Series([10, 20, 30, 40])


print(s)

Output:

go
CopyEdit
0 10
1 20
2 30
3 40
dtype: int64

 0, 1, 2, 3 are index labels (can be changed)


 10, 20, 30, 40 are values

🔹 With custom index:


python
CopyEdit
s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])

🔷 2. Pandas DataFrame

✅ What is a DataFrame?

 A DataFrame is a two-dimensional labeled data structure.


 It consists of rows and columns (like a table).
 Each column in a DataFrame is actually a Series.

✅ Key Features:

 Columns can have different data types


 Data can be accessed using row/column labels

🔹 Creating a DataFrame:
python
CopyEdit
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Score': [85, 90, 95]
}

df = pd.DataFrame(data)
print(df)

Output:

markdown
CopyEdit
Name Age Score
0 Alice 25 85
1 Bob 30 90
2 Charlie 35 95
✅ Difference Between Series and DataFrame:

Feature Series DataFrame

Structure 1D 2D (rows and columns)

Shape Single column Multiple rows and columns

Example [10, 20, 30] Table with rows and columns

Use Represents a column Represents full table of data

Use Represents a column Represents full table of data

What is Data Visualization?

Data Visualization means converting data into visual formats like graphs, charts, and plots.
This helps in understanding trends, patterns, and relationships in data.

What is Matplotlib?

Matplotlib is a popular Python library used to create static, interactive, and animated
visualizations.

 It works well with NumPy and Pandas.


 The main module used is:

python
CopyEdit
import matplotlib.pyplot as plt

Basic Steps to Create a Graph

1. Import the library:

python
CopyEdit
import matplotlib.pyplot as plt

2. Prepare the data:


python
CopyEdit
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

3. Plot the graph:

python
CopyEdit
plt.plot(x, y)

4. Show the graph:

python
CopyEdit
plt.show()

Types of Charts in Matplotlib

1. Line Chart

Used to show trends over time.

python
CopyEdit
plt.plot(x, y)

2. Bar Chart

Used to compare different categories.

python
CopyEdit
plt.bar(x, y)

3. Histogram

Used to show distribution of a dataset.

python
CopyEdit
plt.hist(data)

4. Pie Chart

Used to show percentage distribution.

python
CopyEdit
plt.pie(values, labels=names)

Important Functions in Matplotlib


Function Purpose

plt.plot() Draws line chart

plt.bar() Draws bar chart

plt.hist() Draws histogram

plt.pie() Draws pie chart

plt.xlabel() Sets x-axis label

plt.ylabel() Sets y-axis label

plt.title() Sets the title of the graph

plt.legend() Adds a legend to the chart

plt.grid() Shows grid lines

plt.show() Displays the plot

Example with Title and Labels


python
CopyEdit
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [5, 10, 15, 20]

plt.plot(x, y)
plt.title("Simple Line Graph")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()

Why Use Matplotlib? (Advantages)

 Easy to use and flexible


 Can create many types of charts
 Good for basic as well as advanced plotting
 Works well with other Python libraries
✅ Example

Instead of this:

python
CopyEdit
matplotlib.pyplot.plot([1,2,3], [4,5,6])
matplotlib.pyplot.show()

We use this:

python
CopyEdit
import matplotlib.pyplot as plt

plt.plot([1,2,3], [4,5,6])
plt.show()

✅ Summary
Part Meaning

Import Load a library into Python

matplotlib.pyplot The plotting part of the Matplotlib library

as plt Give it a short name plt for convenience

Data Visualization Using Seaborn


🔷 What is Seaborn?

 Seaborn is a Python visualization library based on Matplotlib.


 It provides a high-level interface for drawing attractive and informative statistical
graphics.
 Works well with Pandas DataFrames.

✅ Importing Seaborn
python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

🔹 1. Line Chart (Line Plot)

Used to show trends over time or continuous data.


python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Example data
data = sns.load_dataset("flights")
sns.lineplot(x="year", y="passengers", data=data)

plt.title("Line Chart")
plt.show()

🔹 2. Bar Chart

Used to compare categories.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Example data
data = sns.load_dataset("titanic")
sns.barplot(x="class", y="fare", data=data)

plt.title("Bar Chart")
plt.show()

🔹 3. Pie Chart

🔸 Note: Seaborn does not support pie charts directly.


🔸 Use Matplotlib for pie charts:

python
CopyEdit
import matplotlib.pyplot as plt

# Example data
labels = ['A', 'B', 'C']
sizes = [30, 45, 25]

plt.pie(sizes, labels=labels, autopct='%1.1f%%')


plt.title("Pie Chart")
plt.show()

🔹 4. Scatter Plot

Used to show relationship between two continuous variables.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Example data
data = sns.load_dataset("iris")
sns.scatterplot(x="sepal_length", y="sepal_width", data=data)

plt.title("Scatter Plot")
plt.show()

🔹 5. Subplots (Multiple Plots in One Figure)

Use Matplotlib to create subplots, and use Seaborn for each plot.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Load data
data = sns.load_dataset("iris")

# Create subplots
plt.figure(figsize=(10,5))

# Subplot 1
plt.subplot(1, 2, 1)
sns.histplot(data["sepal_length"])
plt.title("Sepal Length")

# Subplot 2
plt.subplot(1, 2, 2)
sns.histplot(data["sepal_width"])
plt.title("Sepal Width")

plt.tight_layout()
plt.show()

✅ Summary Table
Chart Type Seaborn Function Notes

Line Chart sns.lineplot() Best for time-series data

Bar Chart sns.barplot() Good for categorical data

Pie Chart Use matplotlib.pyplot.pie() Seaborn does not support it

Scatter Plot sns.scatterplot() Good for 2D relationships

Subplots Use plt.subplot() + sns Combine multiple charts


Let me know if you’d like:

 Output images of these plots 📊


 A small dataset to try them on 📄

Practice questions or quiz 💡ata Visualization Using


Seaborn
🔷 What is Seaborn?

 Seaborn is a Python visualization library based on Matplotlib.


 It provides a high-level interface for drawing attractive and informative statistical
graphics.
 Works well with Pandas DataFrames.

✅ Importing Seaborn
python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

🔹 1. Line Chart (Line Plot)

Used to show trends over time or continuous data.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Example data
data = sns.load_dataset("flights")
sns.lineplot(x="year", y="passengers", data=data)

plt.title("Line Chart")
plt.show()

🔹 2. Bar Chart

Used to compare categories.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt
# Example data
data = sns.load_dataset("titanic")
sns.barplot(x="class", y="fare", data=data)

plt.title("Bar Chart")
plt.show()

🔹 3. Pie Chart

🔸 Note: Seaborn does not support pie charts directly.


🔸 Use Matplotlib for pie charts:

python
CopyEdit
import matplotlib.pyplot as plt

# Example data
labels = ['A', 'B', 'C']
sizes = [30, 45, 25]

plt.pie(sizes, labels=labels, autopct='%1.1f%%')


plt.title("Pie Chart")
plt.show()

🔹 4. Scatter Plot

Used to show relationship between two continuous variables.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Example data
data = sns.load_dataset("iris")
sns.scatterplot(x="sepal_length", y="sepal_width", data=data)

plt.title("Scatter Plot")
plt.show()

🔹 5. Subplots (Multiple Plots in One Figure)

Use Matplotlib to create subplots, and use Seaborn for each plot.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt
# Load data
data = sns.load_dataset("iris")

# Create subplots
plt.figure(figsize=(10,5))

# Subplot 1
plt.subplot(1, 2, 1)
sns.histplot(data["sepal_length"])
plt.title("Sepal Length")

# Subplot 2
plt.subplot(1, 2, 2)
sns.histplot(data["sepal_width"])
plt.title("Sepal Width")

plt.tight_layout()
plt.show()

✅ Summary Table
Chart Type Seaborn Function Notes

Line Chart sns.lineplot() Best for time-series data

Bar Chart sns.barplot() Good for categorical data

Pie Chart Use matplotlib.pyplot.pie() Seaborn does not support it

Scatter Plot sns.scatterplot() Good for 2D relationships

Subplots Use plt.subplot() + sns Combine multiple charts

Let me know if you’d like:

 Output images of these plots 📊


 A small dataset to try them on 📄
 Practice questions or quiz 💡

difference between seaborn and matplotlib

Here’s a clear and simple comparison of Seaborn vs Matplotlib — perfect for school or
beginner-level understanding.
🔍 Difference Between Seaborn and Matplotlib
Feature Matplotlib Seaborn

High-level visualization library built on


Library Type Low-level visualization library
Matplotlib
import matplotlib.pyplot as
Import Syntax plt
import seaborn as sns

Code Complexity More code needed to customize Less code; auto-styling and themes

Design Simple/basic-looking plots Beautiful, modern, and clean plots

Data Format Works with lists, arrays Works best with Pandas DataFrames

Plot Types Line, bar, scatter, pie, histogram, etc. Line, bar, scatter, heatmap, violin, etc.

Auto handles many settings like legends,


Customization Very customizable but manual
colors

Themes Not automatic Built-in themes and styles (sns.set())

Advanced Stats
Needs manual work Built-in support (e.g. boxplot, violinplot)
Plots

Pie Chart Support ✅ Supported ❌ Not directly supported

✅ Example Comparison

🔹 Matplotlib Line Plot:


python
CopyEdit
import matplotlib.pyplot as plt

x = [1, 2, 3]
y = [5, 7, 9]

plt.plot(x, y)
plt.title("Line Plot")
plt.show()

🔹 Seaborn Line Plot:


python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

data = {'x': [1, 2, 3], 'y': [5, 7, 9]}


sns.lineplot(x='x', y='y', data=data)
plt.title("Line Plot")
plt.show()

🏁 Summary in One Line:

Matplotlib gives more control and is used for custom plots.


Seaborn makes it easier to create beautiful statistical plots quickly.

Because Matplotlib gives us control over extra things like:

Feature Example

plt.title() Set a custom title

plt.xlabel() / plt.ylabel() Label axes

plt.xlim() / plt.ylim() Set limits for axes

plt.subplot() Create subplots

plt.grid() Add gridlines

plt.legend() Modify legends

plt.show() Display the plot (final step)

🧠 Think of it like this:

 Seaborn = Cooked dish 🍛 (ready-made, nice-looking plots)


 Matplotlib = Kitchen tools 🍳 (to adjust taste, style, layout, etc.)

You use Seaborn for easier plotting


...and Matplotlib for final touch-ups and display.

✅ Example: Use Both Together


python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt
# Example data
data = sns.load_dataset("tips")

# Seaborn plot
sns.barplot(x="day", y="total_bill", data=data)

# Matplotlib enhancements
plt.title("Average Bill per Day")
plt.xlabel("Day of Week")
plt.ylabel("Total Bill")
plt.grid(True)
plt.show()

Here:

 Seaborn creates the bar chart.


 Matplotlib adds title, labels, grid, and shows the plot.

🟡 Final Answer:

We use Matplotlib with Seaborn to customize and display plots, because Seaborn doesn’t
replace Matplotlib — it extends it.

What is Scikit-learn?

 Scikit-learn (sklearn) is a popular Python library used for machine learning.


 It provides tools to:
o Load datasets
o Preprocess data
o Build models (like classification, regression, clustering)
o Evaluate model performance

✅ Key Features of Scikit-learn


Feature Description

Easy to use Simple and consistent API

Built-in datasets Has small datasets for practice (e.g. iris, digits)

Preprocessing tools Normalize, scale, split, etc.

Many ML models SVM, Decision Trees, KNN, Linear Regression

Model Evaluation Accuracy score, confusion matrix, etc.


Feature Description

🔹 How to Install
bash
CopyEdit
pip install scikit-learn

📂
hat is Machine Learning?

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that allows computers to
learn from data and make decisions or predictions without being explicitly programmed.

📌 Example:

If you give a machine many images of cats and dogs, it can learn the patterns and later predict
whether a new image is of a cat or a dog.

🧠 Why is Machine Learning Important?

 Can automate tasks (like spam detection)


 Learns and improves from experience
 Helps in predictions, recommendations, classification, etc.
 Used in self-driving cars, recommendation systems, medical diagnosis, etc.

🔍 Classification of Machine Learning

ML is mainly divided into 3 types:

1️⃣ Supervised Learning

 Data is labeled (we know the input and correct output).


 The model learns from examples.

📘 Examples:

Task Description

Email Spam Filter "Spam" or "Not Spam"


Task Description

Disease Detection "Positive" or "Negative" result

Price Prediction Predict house price based on size

✅ Algorithms: Linear Regression, Decision Tree, KNN, SVM

2️⃣ Unsupervised Learning

 Data is not labeled (no predefined output).


 The model tries to find hidden patterns in data.

📘 Examples:

Task Description

Customer Segmentation Group similar customers automatically

Market Basket Analysis Find items bought together often

✅ Algorithms: K-Means Clustering, Hierarchical Clustering, PCA

3️⃣ Reinforcement Learning

 The model learns by trial and error.


 It gets rewards or penalties based on actions.

📘 Examples:

Task Description

Game Playing AI Learns to play chess or Atari games

Self-driving Car Learns how to drive safely

✅ Concepts: Agent, Environment, Reward, Policy


Summary Table
Type Data Type Learns From Goal Examples

Supervised Labeled Past examples Predict output Spam filter, disease detection

Unsupervised Unlabeled Hidden patterns Group or reduce data Clustering, market analysis

Reinforcement Feedback Trial & error Maximize rewards Robot control, game AI

You might also like