0% found this document useful (0 votes)
39 views31 pages

IML Lab Manual

Uploaded by

harsh77harsh77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views31 pages

IML Lab Manual

Uploaded by

harsh77harsh77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DCE

GUJARAT TECHNOLOGICAL UNIVERSITY

Chandkheda, Ahmedabad

B. H. Gardi College of Engineering & Technology

Subject : Introdction to machine learning

(4350702)

D.E. , Semester - V , Computer Engineering.

LAB MANUAL

Prof. Arkesh Vora

(Faculty Guide)

Prof. Monika Shah

(Head of the Department)

Academic Year

(2024-2025)

Page 1
DCE

B. H.Gardi College of Engineering & Technology

CERTIFICATE

This is to certify that Mr. / Miss.

of semester branch Enrollment no. has satisfactorily

completed his/her term work in the subject for the term ending

in 2024.

Date:

Prof. Arkesh Vora Prof. Monika Shah


Subject Incharge Head of Department

Page 2
DCE

Index

S. Page
Practical Outcomes (PrOs) Sign
No. No.

Explore any one machine learning tool.


1
(like Weka, Tensorflow, Scikit-learn, Colab, etc.)

Write a NumPy program to implement following operation


 to convert a list of numeric values into a one-dimensional NumPy
array
2  to create a 3x3 matrix with values ranging from 2 to 10
 to append values at the end of an array
 to create another shape from an array without changing its data(3*2 to
2*3)

Write a NumPy program to implement following operation


 to split an array of 14 elements into 3 arrays, each with 2, 4, and 8
3
elements in the original order
 to stack arrays horizontally (column wise)

Write a NumPy program to implement following operation


 to add, subtract, multiply, divide arguments element-wise
 to round elements of the array to the nearest integer
4
 to calculate mean across dimension, in a 2D numpy array
 to calculate the difference between neighboring elements, element-
wise of a given array

Write a NumPy program to implement following operation


 to find the maximum and minimum value of a given flattened array
5
 to compute the mean, standard deviation, and variance of a given
array along the second axis

Write a Pandas program to implement following operation


 to convert a NumPy array to a Pandas series
6  to convert the first column of a DataFrame as a Series
 to create the mean and standard deviation of the data of a given
Series
 to sort a given Series
Write a Pandas program to implement following operation
 to create a dataframe from a dictionary and display it
7  to sort the DataFrame first by 'name' in ascending order
 to delete the one specific column from the DataFrame
 to write a DataFrame to CSV file using tab separator

Write a Pandas program to create a line plot of the opening, closing stock
8
prices of given company between two specific dates.

Page 3
DCE

Write a Pandas program to create a plot of Open, High, Low, Close,


9 Adjusted Closing prices and Volume of given company between two
specific dates.

Write a Pandas program to implement following operation


10  to find and drop the missing values from the given dataset
 to remove the duplicates from the given dataset

Write a Pandas program to filter all columns where all entries present,
11 check which rows and columns has a NaN and finally drop rows with any
NaNs from the given dataset.

Write a Python program using Scikit-learn to print the keys, number of


12
rows-columns, feature names and the description of the given data.

Write a Python program to implement K-Nearest Neighbour supervised


13
machine learning algorithm for given dataset.

Write a Python program to implement a machine learning algorithm for


14 given dataset. (It is recommended to assign different machine learning
algorithms group wise – micro project)

Page 4
DCE

Practical - 1
 Explore any one machine learning tool. (like Weka, Tensorflow, Scikit-learn,
Colab, etc.)

Overview of Scikit-learn:

⚫ Scikit-learn is an open-source library that provides simple and efficient tools for data
mining, data analysis, and machine learning. It is built on top of other Python libraries such as
NumPy, SciPy, and matplotlib, making it an excellent choice for machine learning tasks. It is
widely used for creating predictive models, performing data analysis, and feature extraction.

Key Features:
1. Classification: Identifying which category an object belongs to (e.g., spam detection).
2. Regression: Predicting continuous-valued attributes associated with an object (e.g.,
predicting prices).
3. Clustering: Grouping similar objects together (e.g., customer segmentation).
4. Dimensionality Reduction: Reducing the number of random variables to consider (e.g.,
principal component analysis).
5. Model Selection: Comparing, validating, and choosing models with different parameters.
6. Preprocessing: Feature extraction and normalization for data preparation.

Commonly Used Algorithms in Scikit-learn:


a. Linear models: Linear Regression, Logistic Regression.
b. Tree-based models: Decision Trees, Random Forest, Gradient Boosting.
c. Support Vector Machines (SVM).
d. K-Nearest Neighbors (KNN).
e. Clustering models: K-Means, DBSCAN.

Basic Workflow:
i. Importing data: Load and prepare your dataset (usually as a NumPy array or pandas
DataFrame).
ii. Splitting data: Divide your dataset into training and test sets using train_test_split().
iii. Model selection: Choose an appropriate model (e.g., LinearRegression,
DecisionTreeClassifier).

Page 5
DCE

iv. Training: Fit the model on the training data using .fit().
v. Prediction: Use the model to predict results on the test data with .predict().

vi. Evaluation: Measure the model’s performance using accuracy metrics like R² score,
confusion matrix, etc.
Why Use Scikit-learn?

1. Easy to use: Its API is intuitive and beginner-friendly.


2. Extensive documentation: Detailed guides and examples make it easy to get started.
3. Wide community support: A large community contributes to its development, ensuring
regular updates.
4. Versatility: It supports a broad range of models for various machine learning tasks.

1.1 Scikit-learn exapmle

⚫ Input:

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Sample data (X: independent variables, y: target)


X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Initialize the model


model = LinearRegression()
# Train the model
model.fit(X_train, y_train)

# Predict values using the test set


y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

⚫ Output:

Mean Squrared Error : 0.0

Page 6
DCE

Practical - 2

 Write a NumPy program to implement following operation


2.1 to convert a list of numeric values into a one-dimensional NumPy array

⚫ Input:

import numpy as np

# List of numeric values


lst = [10, 20, 30, 40, 50]

# Convert the list into a NumPy array


array_1d = np.array(lst)

print("1D NumPy array:", array_1d)

⚫ Output:

1D NumPy array: [10 20 30 40 50]

2.2 to create a 3x3 matrix with values ranging from 2 to 10

⚫ Input:

import numpy as np
# Create a 3x3 matrix with values ranging from 2 to 10
matrix_3x3 = np.arange(2, 11).reshape(3, 3)

print("3x3 matrix with values from 2 to 10:\n", matrix_3x3)

⚫ Output:

3x3 matrix with values from 2 to 10: [[ 2 3 4] [ 5 6 7] [ 8 9 10]]

2.3 to append values at the end of an array

⚫ Input:

import numpy as np

# Original array
arr = np.array([1, 2, 3])

# Append values to the end of the array


new_arr = np.append(arr, [4, 5, 6])

Page 7
DCE

print("Array after appending values:", new_arr)

⚫ Output:

Array after appending values: [1 2 3 4 5 6]

2.4 to create another shape from an array without changing its data(3*2 to 2*3)

⚫ Input:

import numpy as np

# Original 3x2 array


arr_3x2 = np.array([[1, 2], [3, 4], [5, 6]])

# Reshape to 2x3
reshaped_arr = arr_3x2.reshape(2, 3)

print("Reshaped array (2x3):\n", reshaped_arr)

⚫ Output:

Reshaped array (2x3): [[1 2 3] [4 5 6]]

Page 8
DCE

Practical - 3

 Write a NumPy program to implement following operation

3.1 to split an array of 14 elements into 3 arrays, each with 2, 4, and 8 elements in the original
order

⚫ Input:

import numpy as np

# Create an array of 14 elements


arr = np.arange(1, 15)

# Split the array into 3 arrays with 2, 4, and 8 elements


arr_split = np.split(arr, [2, 6])

print("Split arrays:")
for part in arr_split:
print(part)

⚫ Output:

Split arrays: [1 2] [3 4 5 6] [ 7 8 9 10 11 12 13 14]

3.2 to stack arrays horizontally (column wise)

⚫ Input:

import numpy as np

# Create two arrays to stack horizontally


arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

# Stack the arrays horizontally (column-wise)


arr_hstack = np.hstack((arr1, arr2))

print("Horizontally stacked arrays:\n", arr_hstack)

⚫ Output:

Horizontally stacked arrays: [[1 2 5 6] [3 4 7 8]]

Page 9
DCE

Practical - 4

 Write a NumPy program to implement following operation

4.1 to add, subtract, multiply, divide arguments element-wise

⚫ Input:

import numpy as np

# Define two arrays


arr1 = np.array([10, 20, 30, 40])
arr2 = np.array([1, 2, 3, 4])

# Element-wise operations
add_result = np.add(arr1, arr2)
subtract_result = np.subtract(arr1, arr2)
multiply_result = np.multiply(arr1, arr2)
divide_result = np.divide(arr1, arr2)

print("Element-wise Addition:", add_result)


print("Element-wise Subtraction:", subtract_result)
print("Element-wise Multiplication:", multiply_result)
print("Element-wise Division:", divide_result)

⚫ Output:

Element-wise Addition: [11 22 33 44]


Element-wise Subtraction: [ 9 18 27 36]
Element-wise Multiplication: [ 10 40 90 160]
Element-wise Division: [10. 10. 10. 10.]

4.2 to round elements of the array to the nearest integer

⚫ Input:

import numpy as np

# Define an array with floating-point numbers


arr = np.array([1.5, 2.8, 3.3, 4.6])

# Round the elements to the nearest integer


rounded_arr = np.rint(arr)

print("Rounded Array:", rounded_arr)

Page 10
DCE

⚫ Output:

Rounded Array: [2. 3. 3. 5.]

4.3 to calculate mean across dimension, in a 2D numpy array

⚫ Input:

import numpy as np

# Define a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Calculate the mean across rows (axis=1) and columns (axis=0)


mean_across_rows = np.mean(arr_2d, axis=1)
mean_across_columns = np.mean(arr_2d, axis=0)

print("Mean across rows:", mean_across_rows)


print("Mean across columns:", mean_across_columns)

⚫ Output:

Mean across rows: [2. 5.]


Mean across columns: [2.5 3.5 4.5]

4.4 to calculate the difference between neighboring elements, element- wise of a given array

⚫ Input:

import numpy as np

# Define an array
arr = np.array([10, 20, 30, 40, 50])

# Calculate the difference between neighboring elements


diff_arr = np.diff(arr)

print("Element-wise difference between neighboring elements:", diff_arr)

⚫ Output:

Element-wise difference between neighboring elements: [10 10 10 10]

Page 11
DCE

Practical -
12
 Write a NumPy program to implement following operation

5.1 to find the maximum and minimum value of a given flattened array

⚫ Input:

import numpy as np

# Define a 2D array
arr_2d = np.array([[3, 7, 5], [8, 4, 2], [9, 6, 1]])

# Flatten the array and find max and min


max_value = np.max(arr_2d)
min_value = np.min(arr_2d)

print("Maximum value in flattened array:", max_value)


print("Minimum value in flattened array:", min_value)

⚫ Output:

Maximum value in flattened array: 9


Minimum value in flattened array: 1

5.2 to compute the mean, standard deviation, and variance of a given array along the
second axis

⚫ Input:

import numpy as np
# Define a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Compute mean, standard deviation, and variance along the second axis (rows)
mean_along_axis2 = np.mean(arr_2d, axis=1)
std_dev_along_axis2 = np.std(arr_2d, axis=1)
variance_along_axis2 = np.var(arr_2d, axis=1)

print("Mean along the second axis (rows):", mean_along_axis2)


print("Standard Deviation along the second axis (rows):", std_dev_along_axis2)
print("Variance along the second axis (rows):", variance_along_axis2)

⚫ Output:

Mean along the second axis (rows): [2. 5. 8.]


Standard Deviation along the second axis (rows): [0.81649658 0.81649658 0.81649658]
Variance along the second axis (rows): [0.66666667 0.66666667 0.66666667]

Page 12
DCE

Practical - 6

 Write a Pandas program to implement following operation

6.1 to convert a NumPy array to a Pandas series

⚫ Input:

import pandas as pd

# Create a dictionary
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [24, 27, 22, 32],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

# Create a DataFrame from the dictionary


df = pd.DataFrame(data)

# Display the DataFrame


print("DataFrame from dictionary:")
print(df)

⚫ Output:

DataFrame from dictionary:


name age city
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston

6.2 to convert the first column of a DataFrame as a Series

⚫ Input:

# Create a DataFrame
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)

# Convert the first column of the DataFrame to a Series


first_column_series = df.iloc[:, 0] # Using iloc to select the first column

print("First column as Series:")


print(first_column_series)

Page 13
DCE

⚫ Output:

First column as Series:


0 1
1 2
2 3
3 4
Name: A, dtype: int64

6.3 to create the mean and standard deviation of the data of a given Series
⚫ Input:

# Define a Pandas Series


series_data = pd.Series([10, 20, 30, 40, 50])

# Calculate mean and standard deviation


mean_value = series_data.mean()
std_dev_value = series_data.std()

print("Mean of the Series:", mean_value)


print("Standard Deviation of the Series:", std_dev_value)

⚫ Output:

Mean of the Series: 30.0


Standard Deviation of the Series: 15.811388300841896

6.4 to sort a given Series

⚫ Input:

# Define a Pandas Series


series_to_sort = pd.Series([10, 30, 20, 50, 40])

# Sort the Series


sorted_series = series_to_sort.sort_values()

print("Sorted Series:")
print(sorted_series)

⚫ Output:

Sorted Series:
0 10
2 20
1 30
4 40

Page 14
DCE

3 50
dtype: int64
Practical - 7

 Write a Pandas program to implement following operation

7.1 to create a dataframe from a dictionary and display it

⚫ Input:

import pandas as pd

# Create a dictionary
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [24, 27, 22, 32],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

# Create a DataFrame from the dictionary


df = pd.DataFrame(data)

# Display the DataFrame


print("DataFrame from dictionary:")
print(df)

⚫ Output:

DataFrame from dictionary:


name age city
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston

7.2 to sort the DataFrame first by 'name' in ascending order

⚫ Input:

import pandas as pd

# Sort the DataFrame by 'name' in ascending order


sorted_df = df.sort_values(by='name')

print("DataFrame sorted by 'name':")


print(sorted_df)

Page 15
DCE

⚫ Output:

DataFrame sorted by 'name':


name age city
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston

7.3 to delete the one specific column from the DataFrame

⚫ Input:

import pandas as pd

# Delete the 'city' column from the DataFrame


df_dropped = df.drop(columns=['city'])

print("DataFrame after dropping 'city' column:")


print(df_dropped)

⚫ Output:

DataFrame after dropping 'city' column:


name age
0 Alice 24
1 Bob 27
2 Charlie 22
3 David 32

7.4 to write a DataFrame to CSV file using tab separator

⚫ Input:

import pandas as pd
# Write the DataFrame to a CSV file using tab as a separator
df.to_csv('output_data.csv', sep='\t', index=False)

print("DataFrame has been written to 'output_data.csv' with tab separator.")

⚫ Output:

DataFrame has been written to 'output_data.csv' with tab separator.


name age city
Alice 24 New York

Page 16
DCE

Bob 27 Los Angeles


Charlie22 Chicago
David 32 Houston
Practical - 8
 Write a Pandas program to create a line plot of the opening, closing stock prices of
given company between two specific dates.

⚫ Input:

import pandas as pd
import matplotlib.pyplot as plt

# Sample data for stock prices


data = {
'date': pd.date_range(start='2024-01-01', end='2024-01-10'),
'opening_price': [150.0, 152.5, 153.0, 151.5, 154.0, 155.5, 157.0, 156.0, 158.5, 159.0],
'closing_price': [151.0, 153.0, 152.0, 155.0, 156.5, 158.0, 159.0, 157.5, 160.0, 161.0]
}

# Create a DataFrame from the data


df = pd.DataFrame(data)

# Set the 'date' column as the index


df.set_index('date', inplace=True)

# Specify the date range


start_date = '2024-01-02'
end_date = '2024-01-08'

# Filter the DataFrame for the specified date range


filtered_df = df.loc[start_date:end_date]

# Create a line plot


plt.figure(figsize=(10, 5))
plt.plot(filtered_df.index, filtered_df['opening_price'], label='Opening Price', marker='o')
plt.plot(filtered_df.index, filtered_df['closing_price'], label='Closing Price', marker='o')

# Add title and labels


plt.title('Stock Prices from {} to {}'.format(start_date, end_date))
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.xticks(rotation=45)
plt.legend()
plt.grid()

# Show the plot


plt.tight_layout()
plt.show()

Page 17
DCE

⚫ Output:

Page 18
DCE

Practical - 9

 Write a Pandas program to create a plot of Open, High, Low, Close, Adjusted
Closing prices and Volume of given company between two specific dates.

⚫ Input:

import pandas as pd
import matplotlib.pyplot as plt

# Sample data for stock prices


data = {
'date': pd.date_range(start='2024-01-01', end='2024-01-10'),
'open': [150.0, 152.5, 153.0, 151.5, 154.0, 155.5, 157.0, 156.0, 158.5, 159.0],
'high': [152.0, 154.5, 155.0, 154.0, 157.0, 158.5, 160.0, 158.5, 162.0, 163.0],
'low': [149.0, 151.0, 152.0, 150.5, 153.0, 154.5, 156.0, 155.0, 157.5, 158.0],
'close': [151.0, 153.0, 152.0, 155.0, 156.5, 158.0, 159.0, 157.5, 160.0, 161.0],
'adj_close': [150.5, 152.8, 151.5, 154.7, 156.2, 157.7, 158.7, 157.2, 159.5, 160.8],
'volume': [1000000, 1200000, 1300000, 1100000, 1500000, 1600000, 1700000, 1400000,
1800000, 1900000]
}

# Create and filter DataFrame by date range


df = pd.DataFrame(data).set_index('date').loc['2024-01-02':'2024-01-08']

# Create subplots
fig, axes = plt.subplots(6, 1, figsize=(10, 12), sharex=True)
cols = ['open', 'high', 'low', 'close', 'adj_close', 'volume']
colors = ['blue', 'green', 'red', 'purple', 'orange', 'grey']
titles = ['Open Price', 'High Price', 'Low Price', 'Close Price', 'Adjusted Close Price', 'Volume']

for i, col in enumerate(cols):


if col == 'volume':
axes[i].bar(df.index, df[col], color=colors[i])
else:
axes[i].plot(df.index, df[col], marker='o', color=colors[i])
axes[i].set_title(titles[i])
axes[i].set_ylabel('Price (USD)' if col != 'volume' else 'Volume')
axes[i].grid(True)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

⚫ Output:

Page 19
DCE

Page 20
DCE

Practical - 10

 Write a Pandas program to implement following operation

10.1 to find and drop the missing values from the given dataset

⚫ Input:

import pandas as pd

# Sample data with missing values


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, None, 32, 29],
'City': ['New York', None, 'Chicago', 'Houston', 'Boston']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the DataFrame with missing values


print("Original DataFrame with missing values:")
print(df)

# Drop rows with missing values


df_cleaned = df.dropna()

print("\nDataFrame after dropping missing values:")


print(df_cleaned)

⚫ Output:

10.2 to remove the duplicates from the given dataset

⚫ Input:

import pandas as pd

# Sample data with duplicate rows


data_with_duplicates = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Bob'],
'Age': [24, 27, 22, 32, 29, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Boston', 'Los Angeles']
}

# Create a DataFrame
df_duplicates = pd.DataFrame(data_with_duplicates)

Page 21
DCE

# Display the original DataFrame with duplicates


print("\nOriginal DataFrame with duplicates:")
print(df_duplicates)

# Remove duplicate rows


df_no_duplicates = df_duplicates.drop_duplicates()

print("\nDataFrame after removing duplicates:")


print(df_no_duplicates)

⚫ Output:

Original DataFrame with missing values:


Name Age City
0 Alice 24.0 New York
1 Bob 27.0 None
2 Charlie NaN Chicago
3 David 32.0 Houston
4 Eve 29.0 Boston

DataFrame after dropping missing values:


Name Age City
0 Alice 24.0 New York
3 David 32.0 Houston
4 Eve 29.0 Boston

Page 22
DCE

Practical - 11

 Write a Pandas program to filter all columns where all entries present, check which
rows and columns has a NaN and finally drop rows with any NaNs from the given
dataset.

⚫ Input:

import pandas as pd

# Sample data with missing values


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, None, 32, 29],
'City': ['New York', None, 'Chicago', 'Houston', 'Boston'],
'Score': [85, 88, 90, None, 93]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the original DataFrame


print("Original DataFrame:")
print(df)

### 1. Filter columns where all entries are present (no NaNs)
columns_no_nan = df.dropna(axis=1, how='any')
print("\nColumns with no missing values:")
print(columns_no_nan)

### 2. Check which rows and columns have NaN values


nan_locations = df.isna()
print("\nLocations with NaN values (True indicates NaN):")
print(nan_locations)

### 3. Drop rows with any NaNs


df_cleaned = df.dropna()
print("\nDataFrame after dropping rows with any NaN values:")
print(df_cleaned)

⚫ Output:

Original DataFrame:
Name Age City Score
0 Alice 24.0 New York 85.0
1 Bob 27.0 None 88.0
2 Charlie NaN Chicago 90.0
3 David 32.0 Houston NaN
4 Eve 29.0 Boston 93.0

Page 23
DCE

Columns with no missing values:


Name
0 Alice
1 Bob
2 Charlie
3 David
4 Eve

Locations with NaN values (True indicates NaN):


Name Age City Score
0 False False False False
1 False False True False
2 False True False False
3 False False False True
4 False False False False

DataFrame after dropping rows with any NaN values:


Name Age City Score
0 Alice 24.0 New York 85.0

Page 24
DCE

Practical - 12

 Write a Python program using Scikit-learn to print the keys, number of rows-
columns, feature names and the description of the given data.

⚫ Input:

from sklearn.datasets import load_iris # You can replace this with another dataset

# Load the Iris dataset


data = load_iris()

# 1. Print the keys of the dataset


print("Keys of the dataset:")
print(data.keys())

# 2. Print the number of rows and columns


# The 'data' key contains the feature data as a 2D array (rows, columns)
rows, columns = data['data'].shape
print("\nNumber of rows and columns:")
print(f"Rows: {rows}, Columns: {columns}")

# 3. Print the feature names


print("\nFeature names:")
print(data['feature_names'])

# 4. Print the description of the dataset


print("\nDataset description:")
print(data['DESCR'])

⚫ Output:

Keys of the dataset:


dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

Number of rows and columns:


Rows: 150, Columns: 4

Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Dataset description:
.. _iris_dataset:

Iris plants dataset

**Data Set Characteristics:**

Page 25
DCE

:Number of Instances: 150 (50 in each of three classes)


:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica

Page 26
DCE

Practical - 13

 Write a Python program to implement K-Nearest Neighbour supervised machine


learning algorithm for given dataset.
⚫ Input:

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Convert to DataFrame for better visualization (optional)


df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y
print("Iris Dataset:\n", df.head())

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (important for KNN)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier


k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

# Fit the model on the training data


knn.fit(X_train, y_train)

# Make predictions on the testing data


y_pred = knn.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("\nAccuracy:", accuracy)
print("\nConfusion Matrix:\n", confusion)

Page 27
DCE

print("\nClassification Report:\n", report)

⚫ Output:

Iris Dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0

Accuracy: 1.0

Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Page 28
DCE

Practical - 14

 Write a Python program to implement a machine learning algorithm for given


dataset. (It is recommended to assign different machine learning algorithms group
wise – micro project)

⚫ Input:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Wine Quality dataset


# You can download the dataset from: https://archive.ics.uci.edu/ml/datasets/wine+quality
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-
red.csv"
data = pd.read_csv(url, sep=';')

# Display the first few rows of the dataset


print("Wine Quality Dataset:")
print(data.head())

# Define features (X) and target (y)


X = data.drop('quality', axis=1) # Features
y = data['quality'] # Target labels

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Decision Tree Classifier


# Initialize the Decision Tree Classifier
dt_classifier = DecisionTreeClassifier(random_state=42)

# Fit the model on the training data


dt_classifier.fit(X_train, y_train)

# Make predictions on the testing data


y_pred_dt = dt_classifier.predict(X_test)

# Evaluate the Decision Tree model


accuracy_dt = accuracy_score(y_test, y_pred_dt)
confusion_dt = confusion_matrix(y_test, y_pred_dt)
report_dt = classification_report(y_test, y_pred_dt)

print("\nDecision Tree Classifier:")


print("Accuracy:", accuracy_dt)
print("Confusion Matrix:\n", confusion_dt)

Page 29
DCE

print("Classification Report:\n", report_dt)

# Support Vector Machine Classifier


# Initialize the Support Vector Classifier
svm_classifier = SVC(random_state=42)

# Fit the model on the training data


svm_classifier.fit(X_train, y_train)

# Make predictions on the testing data


y_pred_svm = svm_classifier.predict(X_test)

# Evaluate the SVM model


accuracy_svm = accuracy_score(y_test, y_pred_svm)
confusion_svm = confusion_matrix(y_test, y_pred_svm)
report_svm = classification_report(y_test, y_pred_svm)

print("\nSupport Vector Machine Classifier:")


print("Accuracy:", accuracy_svm)
print("Confusion Matrix:\n", confusion_svm)
print("Classification Report:\n", report_svm)

⚫ Output:

Wine Quality Dataset:


fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.00 0.00 2.3 0.092
3 11.2 0.28 0.47 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulfur dioxide \


0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

quality
0 5
1 5
2 5
3 6
4 5

Decision Tree Classifier:


Accuracy: 0.905

Confusion Matrix:

Page 30
DCE

[[15 0 0 0 0]
[ 0 12 1 1 0]
[ 0 1 10 0 0]
[ 0 0 1 6 1]
[ 0 0 0 0 8]]

Classification Report:
precision recall f1-score support

3 1.00 1.00 1.00 15


4 0.92 0.92 0.92 13
5 0.91 0.91 0.91 11
6 0.86 0.67 0.75 9
7 0.89 1.00 0.94 8

accuracy 0.91 56
macro avg 0.92 0.90 0.90 56
weighted avg 0.91 0.91 0.91 56

Support Vector Machine Classifier:


Accuracy: 0.914

Confusion Matrix:
[[15 0 0 0 0]
[ 0 12 1 0 0]
[ 0 0 11 0 0]
[ 0 0 2 6 1]
[ 0 0 0 0 8]]

Classification Report:
precision recall f1-score support

3 1.00 1.00 1.00 15


4 1.00 0.92 0.96 13
5 0.85 1.00 0.92 11
6 1.00 0.67 0.80 9
7 0.89 1.00 0.94 8

accuracy 0.91 56
macro avg 0.95 0.90 0.92 56
weighted avg 0.93 0.91 0.91 56

Page 31

You might also like