0% found this document useful (0 votes)
11 views52 pages

BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas

The document provides an overview of Python programming, covering key concepts such as variables, data types, conditionals, loops, functions, and error handling. It also introduces libraries like NumPy, Pandas, Matplotlib, and SciPy, highlighting their functionalities and examples. Additionally, it touches on object-oriented programming and file handling in Python.

Uploaded by

deppubadshah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views52 pages

BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas

The document provides an overview of Python programming, covering key concepts such as variables, data types, conditionals, loops, functions, and error handling. It also introduces libraries like NumPy, Pandas, Matplotlib, and SciPy, highlighting their functionalities and examples. Additionally, it touches on object-oriented programming and file handling in Python.

Uploaded by

deppubadshah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

EXPERIMENT – 1

AIM
Exploring and demonstrating Python.

1. Variables and Data Types


Variables are used to store data in Python. Python is dynamically typed, meaning the data
type of a variable is inferred when the variable is assigned a value.
Data Types:
 Integers: Whole numbers, e.g., 5, 100, -3
 Floats: Numbers with a decimal point, e.g., 3.14, -1.23
 Strings: Sequences of characters enclosed in either single or double quotes, e.g.,
"Hello", 'World'
 Booleans: Represents True or False values, used for logical operations
 Lists: Ordered, mutable collections that can contain elements of different data types
 Tuples: Ordered, immutable collections
 Dictionaries: Unordered collections of key-value pairs
 Sets: Unordered collections of unique elements
Python also supports None (a null value) to represent the absence of a value.
Example:

# Define variables of different data types


integer_var = 10
float_var = 3.14
string_var = "Hello, World!"
boolean_var = True
list_var = [1, 2, 3]
tuple_var = (4, 5, 6)
set_var = {7, 8, 9}
dict_var = {"key1": "value1", "key2": "value2"}
none_var = None

# Print variables with their data types


print(f"Integer: {integer_var}, Data type: {type(integer_var)}")
print(f"Float: {float_var}, Data type: {type(float_var)}")
print(f"String: '{string_var}', Data type: {type(string_var)}")
print(f"Boolean: {boolean_var}, Data type: {type(boolean_var)}")
print(f"List: {list_var}, Data type: {type(list_var)}")
print(f"Tuple: {tuple_var}, Data type: {type(tuple_var)}")
print(f"Set: {set_var}, Data type: {type(set_var)}")
print(f"Dictionary: {dict_var}, Data type: {type(dict_var)}")
print(f"None: {none_var}, Data type: {type(none_var)}")

Output:

2. Conditionals
Conditional statements allow you to execute specific blocks of code based on conditions.
Python uses if, elif, and else for condition checking.
 if: Used to check a condition
 elif: Used for additional conditions if the previous if is false
 else: Executed when none of the conditions in if or elif are met.

Example:
age = 20
if age >= 18:
print("Adult")
else:
print("Minor")
Output:
3. Loops
Loops allow you to repeat a block of code multiple times. Python provides two main types of
loops:
 for loop: Used to iterate over a sequence (like a list, tuple, or string) or to repeat a
block of code a specific number of times.
 while loop: Continues to execute a block of code as long as a condition is True.
Example of a for loop:
for i in range(5):
print(i) # Output: 0, 1, 2, 3, 4

Output:

Example of a while loop:


count = 0
while count < 8:
print(count) # Output: 0, 1, 2, 3, 4
count += 1
Output:

4. Functions
Functions in Python are defined using the def keyword, and they allow you to organize your
code into reusable blocks.
Functions can have parameters (inputs) and return values (outputs).

Example of a simple function:

def greet(name):
print(f"Hello, {name}!")

greet("Honey")
Output:
5. Lists
A list is an ordered collection of elements, and it is one of the most commonly used data
structures in Python. Lists are mutable, meaning you can change their contents after they
are created.
Example:
fruits = ["apple", "banana", "cherry"]
fruits.append("orange") # Adds an item to the list
print(fruits) # Output: ['apple', 'banana', 'cherry', 'orange']
Output:

6. Dictionaries
Dictionaries are unordered collections of key-value pairs. Each key is unique, and each key
maps to a value. You can access, modify, and add elements using the keys.
Example:
person = {"name": "Honey", "age": 20}
print(person["name"])
person["age"] = 21 # Modify value
print(person)
Output:
7. Error Handling
Python uses try, except, else, and finally to handle exceptions (errors) during execution. This
allows your program to continue running even if an error occurs.
 try: Block of code that might raise an exception
 except: Handles the exception
 else: Executes if no exception occurs
 finally: Executes no matter what, after try and except

Example:
try:
x = int(input("Enter a number: "))
result = 10 / x
except ZeroDivisionError:
print("Cannot divide by zero!")
except ValueError:
print("Invalid input!")
else:
print(f"The result is {result}")
finally:
print("Execution completed.")
Output:

8. Classes and Objects (Object-Oriented Programming)


Python supports object-oriented programming (OOP), which allows you to structure your
code in terms of objects and classes.
 Class: A blueprint for creating objects
 Object: An instance of a class
 Methods: Functions that are associated with a class
Example:
class Car:
def __init__(self, make, model):
self.make = make
self.model = model

def display_info(self):
print(f"Car Make: {self.make}, Model: {self.model}")

# Creating an object of class Car


my_car = Car("Toyota", "Corolla")
my_car.display_info()
Output:

9. File Handling
Python allows you to interact with files using built-in functions. You can open, read, write,
and close files.
 open(): Opens a file for reading or writing
 read(): Reads the contents of a file
 write(): Writes to a file
 close(): Closes the file
Example:
# Writing to a file
with open("example.txt", "w") as file:
file.write("Hello, World!")
# Reading from a file
with open("example.txt", "r") as file:
content = file.read()
print(content)
Output:

10. List Comprehensions


List comprehensions provide a concise way to create lists. They allow you to generate a new
list by applying an expression to each element in an existing iterable.
Example:
squares = [x**2 for x in range(5)]
print(squares) # Output: [0, 1, 4, 9, 16]
Output:

11. Lambda Functions


Lambda functions are small, anonymous functions that can have any number of arguments
but only one expression. They are often used for short-term operations.
Example:
multiply = lambda x, y: x * y
print(multiply(2, 3))
Output:

12. Modules and Libraries


Python has a rich ecosystem of built-in libraries and third-party modules. You can import and
use them in your code using the import keyword.
Example using the math module:
import math
result = math.sqrt(16)
print(result)
Output:

You can also create your own modules and import them into your programs.

PYTHON LIBRARIES
1.Numpy
NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use
it freely.
NumPy stands for Numerical Python.
Some functionalities of numpy are: -
 Array Creation
import numpy as np

# Create a 1D array
array_1d = np.array([1, 2, 3, 4])
print("1D Array:", array_1d)

# Create a 2D array
array_2d = np.array([[1, 2], [3, 4]])
print("2D Array:\n", array_2d)

# Create arrays with zeros, ones, or random numbers


zeros_array = np.zeros((2, 3))
ones_array = np.ones((2, 3))
random_array = np.random.rand(2, 3)
print("Zeros Array:\n", zeros_array)
print("Ones Array:\n", ones_array)
print("Random Array:\n", random_array)

Output:
 Array Operations

import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Element-wise addition, subtraction, multiplication, and division


print("Addition:", array1 + array2)
print("Subtraction:", array1 - array2)
print("Multiplication:", array1 * array2)
print("Division:", array1 / array2)

# Dot product
print("Dot Product:", np.dot(array1, array2))

Output:

 Indexing and Slicing


import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Accessing elements
print("Element at (1,2):", array[1, 2])

# Slicing rows and columns


print("First row:", array[0, :])
print("First column:", array[:, 0])
print("Sub-array:\n", array[1:3, 1:3])
Output:

 Mathematical Functions
import numpy as np

data = np.array([1, 2, 3, 4, 5])

print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Standard Deviation:", np.std(data))
print("Sum:", np.sum(data))
print("Cumulative Sum:", np.cumsum(data))

Output:

 Linear Algebra
import numpy as np

matrix = np.array([[1, 2], [3, 4]])

# Transpose of a matrix
print("Transpose:\n", np.transpose(matrix))

# Determinant
print("Determinant:", np.linalg.det(matrix))

# Inverse of a matrix
print("Inverse:\n", np.linalg.inv(matrix))

# Eigenvalues and eigenvectors


eigenvalues, eigenvectors = np.linalg.eig(matrix)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
Output:

2. Pandas
Pandas is a Python library widely used for data analysis and manipulation. It provides
structures like DataFrame and Series, which allow you to work with structured data
efficiently. Here are some key functionalities of Pandas along with implementation
examples:
 Creating Data Structure
import pandas as pd

# Create a Series
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print("Series:\n", series)

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los
Angeles', 'Chicago']}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)

Output:
 Indexing and Selecting Data
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Access a column
print("Column A:\n", df['A'])

# Access rows using loc and iloc


print("\nFirst row using loc:\n", df.loc[0])
print("\nFirst row using iloc:\n", df.iloc[0])

# Access specific elements


print("\nElement at (0, 1):", df.iloc[0, 1])

Output:

 Filtering and conditional selection


import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Salary': [50000, 60000,
70000]}
df = pd.DataFrame(data)

# Filter rows where Age > 30


filtered = df[df['Age'] > 30]
print("Filtered DataFrame:\n", filtered)

Output:
 Data Cleaning
import pandas as pd
data = {'A': [1, 2, None], 'B': [None, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Check for missing values


print("Missing Values:\n", df.isnull())

# Fill missing values


df_filled = df.fillna(0)
print("\nFilled DataFrame:\n", df_filled)

# Drop rows with missing values


df_dropped = df.dropna()
print("\nDataFrame after dropping missing values:\n", df_dropped)

Output:

 Groupby Operation

import pandas as pd
data = {'Category': ['A', 'A', 'B', 'B'], 'Values': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Group by 'Category' and calculate sum


grouped = df.groupby('Category').sum()
print("Grouped Data:\n", grouped)

Output:
3. Matplotlib
Matplotlib is a powerful Python library for creating static, interactive, and animated
visualizations. Here are some basic functionalities of Matplotlib with implementation
examples:
 Plotting a Simple Line Graph

import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create a line plot


plt.plot(x, y, label='y = 2x')
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Output:

 Scatter Plot

import matplotlib.pyplot as plt

# Data
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create a scatter plot


plt.scatter(x, y, color='red', label="Data Points")
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Output:

 Bar Chart

import matplotlib.pyplot as plt

# Data
categories = ['A', 'B', 'C', 'D']
values = [3, 7, 8, 5]

# Create a bar chart


plt.bar(categories, values, color='blue')
plt.title("Bar Chart")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()
Output:

 Histogram

import matplotlib.pyplot as plt

# Data
data = [1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7, 8]

# Create a histogram
plt.hist(data, bins=5, color='green', edgecolor='black')
plt.title("Histogram")
plt.xlabel("Bins")
plt.ylabel("Frequency")
plt.show()
Output:

 Pie Chart

import matplotlib.pyplot as plt

# Data
labels = ['Python', 'Java', 'C++', 'Ruby']
sizes = [50, 30, 15, 5]
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']

# Create a pie chart


plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
plt.title("Pie Chart")
plt.show()

Output:

4. SciPy
SciPy is a Python library built on top of NumPy that is widely used for scientific and
numerical computations. It provides modules for optimization, integration, interpolation,
linear algebra, signal processing, statistics, and more. Below are the basic functionalities of
SciPy with examples:
 Linear Algebra Operations

from scipy import linalg


import numpy as np

# Matrix
A = np.array([[3, 2], [1, 4]])

# Compute the determinant


det = linalg.det(A)
print("Determinant:", det)

# Compute the inverse


inverse = linalg.inv(A)
print("\nInverse:\n", inverse)

# Solve a linear system (Ax = b)


b = np.array([6, 8])
x = linalg.solve(A, b)
print("\nSolution to Ax = b:\n", x)

Output:

 Optimization

from scipy.optimize import minimize

# Define a function to minimize


def func(x):
return (x - 3)**2 + 4

# Minimize the function


result = minimize(func, x0=0) # x0 is the initial guess
print("Optimization Result:\n", result)

Output:

 Integration

from scipy import integrate

# Define a function
def f(x):
return x**2

# Integrate f(x) from 0 to 3


result, error = integrate.quad(f, 0, 3)
print("Integration Result:", result)

Output:

 Interpolation

from scipy import interpolate


import numpy as np

# Data points
x = [0, 1, 2, 3, 4]
y = [1, 2, 0, 2, 1]

# Create a cubic spline interpolation


f = interpolate.interp1d(x, y, kind='cubic')

# Interpolate at new points


x_new = np.linspace(0, 4, 50)
y_new = f(x_new)

# Plot the result


import matplotlib.pyplot as plt
plt.plot(x, y, 'o', label='Data Points')
plt.plot(x_new, y_new, '-', label='Cubic Spline')
plt.legend()
plt.show()

Output:

 Signal Processing

from scipy.signal import butter, lfilter


import numpy as np
import matplotlib.pyplot as plt

# Create a sample signal


fs = 500 # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False) # Time vector
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.randn(t.size)

# Create a low-pass Butterworth filter


b, a = butter(4, 0.1, btype='low') # Order=4, cutoff=0.1

# Apply the filter


filtered_signal = lfilter(b, a, signal)

# Plot the result


plt.plot(t, signal, label='Original Signal')
plt.plot(t, filtered_signal, label='Filtered Signal', linewidth=2)
plt.legend()
plt.show()
Output:

5. Scikit- learn
Scikit-learn is a powerful Python library used for machine learning, providing simple and
efficient tools for data mining and data analysis. It supports various supervised and
unsupervised learning algorithms, and it's built on top of NumPy, SciPy, and matplotlib.
Below are the basic functionalities of Scikit-learn with examples:

 Data Preprocessing

from sklearn.preprocessing import StandardScaler, MinMaxScaler


import numpy as np

# Data
data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])

# Standardization (zero mean, unit variance)


scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)
print("Standardized Data:\n", standardized_data)

# Normalization (scales between 0 and 1)


normalizer = MinMaxScaler()
normalized_data = normalizer.fit_transform(data)
print("\nNormalized Data:\n", normalized_data)

Output:

 Train-Test Split

from sklearn.model_selection import train_test_split


import numpy as np

# Data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y = np.array([1, 2, 3, 4, 5])

# Split into 80% training and 20% testing


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

print("X_train:", X_train)
print("X_test:", X_test)

Output:

 Linear Regression

import numpy as np
from sklearn.linear_model import LinearRegression

# Data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

# Create a Linear Regression model


model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict([[6]])
print("Predicted value for input 6:", y_pred)
Output:

 Logistic Regression

import numpy as np
from sklearn.linear_model import LogisticRegression

# Data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 1, 1, 1]) # Binary target variable

# Create and train a Logistic Regression model


model = LogisticRegression()
model.fit(X, y)

# Predict
y_pred = model.predict([[6]])
print("Predicted class for input 6:", y_pred)

Output:

 K-Nearest Neighbors (KNN)

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# Data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 1, 1, 1]) # Binary target variable
# Create and train a KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)

# Predict
y_pred = knn.predict([[6]])
print("Predicted class for input 6:", y_pred)

Output:

EXPERIMENT – 2
AIM
Perform Data Preprocessing like outlier detection, handling missing value, analyzing
redundancy and normalization on different datasets.

THEORY
Data preprocessing is a crucial step in machine learning, ensuring that data is clean,
consistent, and ready for training. Below are common data preprocessing techniques with
Python code using pandas and scikit-learn.

1. Handling Missing Values


Missing values can be handled by removing them or imputing them.
import pandas as pd
from sklearn.impute import SimpleImputer
# Sample data
data = {'Age': [25, 30, None, 35, 40],
'Salary': [50000, 60000, 75000, None, 90000]}
df = pd.DataFrame(data)
# Impute missing values with mean
imputer = SimpleImputer(strategy='mean')
df[['Age', 'Salary']] = imputer.fit_transform(df[['Age', 'Salary']])
print(df)
Output:

2. Encoding Categorical Data


Machine learning models work with numerical values, so categorical data must be encoded.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import numpy as np
import pandas as pd
# Sample categorical data
df = pd.DataFrame({'City': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore', 'Mumbai']})
# Label Encoding
label_encoder = LabelEncoder()
df['City_Label'] = label_encoder.fit_transform(df['City'])
# One-Hot Encoding
one_hot_encoder = OneHotEncoder(sparse_output=False)
encoded = one_hot_encoder.fit_transform(df[['City']])
# Convert to DataFrame
df_encoded = pd.DataFrame(encoded,
columns=one_hot_encoder.get_feature_names_out(['City']))
df = pd.concat([df, df_encoded], axis=1)
print(df)
Output:

3. Feature Scaling
Feature scaling standardizes or normalizes numerical features.

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Sample data
data = {'Height': [150, 160, 170, 180, 190],
'Weight': [50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
# Standardization (Z-score normalization)
scaler = StandardScaler()
df[['Height', 'Weight']] = scaler.fit_transform(df[['Height', 'Weight']])
print(df)

Output:
4. Feature Engineering (Polynomial Features)
Creating new features from existing ones.

import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
# Sample data
data = {'Feature': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Polynomial Features (Degree 2)
poly = PolynomialFeatures(degree=2, include_bias=False)
df_poly = pd.DataFrame(poly.fit_transform(df), columns=['Feature', 'Feature^2'])
print(df_poly)
Output:

5. Dimensionality Reduction (PCA)


Reduces the number of features while preserving variance.
import pandas as pd
from sklearn.decomposition import PCA
import numpy as np
# Sample data
np.random.seed(42)
data = np.random.rand(5, 3) # 5 samples, 3 features
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
# Applying PCA
pca = PCA(n_components=2)
df_pca = pd.DataFrame(pca.fit_transform(df), columns=['PC1', 'PC2'])
print(df_pca)
Output:

6. Handling Outliers (Using IQR)


Detecting and removing outliers using the Interquartile Range (IQR) method.
import numpy as np
import pandas as pd
# Sample data
data = {'Salary': [50000, 60000, 75000, 90000, 120000, 300000]} # 300000 is an outlier
df = pd.DataFrame(data)
# Calculate IQR
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1
# Define bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Remove outliers
df_cleaned = df[(df['Salary'] >= lower_bound) & (df['Salary'] <= upper_bound)]
print(df_cleaned)

Output:

EXPERIMENT – 3
AIM
Write a program to implement decision tree based on ID3, C4.5 and CART algorithm.

THEORY
A decision tree is a supervised learning algorithm used for both classification and regression
tasks. It models decisions in a tree-like structure where:
 Each internal node represents a feature (attribute).
 Each branch represents a decision based on that feature’s value.
 Each leaf node represents a class label (for classification) or a numeric output (for
regression).
ID3
The ID3 algorithm, developed by Ross Quinlan, builds a decision tree using Entropy and
Information Gain as splitting criteria.
How ID3 Works
 Entropy (HHH) measures the impurity (randomness) in a dataset:
o If all examples belong to one class, entropy is 0 (pure).
o If the examples are evenly split between classes, entropy is 1 (high impurity).
 Information Gain (IG) measures how much a feature reduces entropy:
o The feature with the highest Information Gain is chosen as the root node.

CODE
import numpy as np
import pandas as pd
from collections import Counter

def entropy(data):
labels = data.iloc[:, -1] # Assuming last column is the target
label_counts = Counter(labels)
total = len(labels)
ent = -sum((count/total) * np.log2(count/total) for count in label_counts.values())
return ent

def information_gain(data, feature):


total_entropy = entropy(data)
values = data[feature].unique()
weighted_entropy = sum((len(subset)/len(data)) * entropy(subset) for value in values
if len(subset := data[data[feature] == value]) > 0)
gain = total_entropy - weighted_entropy
print(f"Information Gain for {feature}: {gain:.4f}")
return gain

def best_feature(data):
features = data.columns[:-1] # Exclude target column
return max(features, key=lambda feature: information_gain(data, feature))

def id3(data, tree=None, depth=0):


labels = data.iloc[:, -1]
if len(set(labels)) == 1:
return labels.iloc[0] # Pure class
if len(data.columns) == 1:
return labels.mode()[0] # Majority class
best_feat = best_feature(data)
print(f"\nBest Feature at depth {depth}: {best_feat}")
if tree is None:
tree = {}
tree[best_feat] = {}
for value in data[best_feat].unique():
subset = data[data[best_feat] == value].drop(columns=[best_feat])
tree[best_feat][value] = id3(subset, depth=depth+1)
return tree

def print_tree(tree, indent=""):


if not isinstance(tree, dict):
print(indent + "-> " + str(tree))
return
for key, subtree in tree.items():
print(indent + str(key))
for value, subsubtree in subtree.items():
print(indent + f" {value}:")
print_tree(subsubtree, indent + " ")

# Example dataset
columns = ["CGPA", "Interactiveness", "Practical Knowledge", "Skills", "Placed"]
data = pd.DataFrame([
["High", "Good", "Excellent", "Strong", "Yes"],
["Low", "Poor", "Weak", "Weak", "No"],
["Medium", "Average", "Good", "Medium", "Yes"],
["High", "Good", "Good", "Strong", "Yes"],
["Medium", "Average", "Average", "Medium", "No"],
["Low", "Poor", "Weak", "Weak", "No"],
["High", "Excellent", "Excellent", "Strong", "Yes"],
["Medium", "Good", "Good", "Medium", "Yes"],
["Low", "Average", "Poor", "Weak", "No"],
["Medium", "Good", "Average", "Medium", "Yes"],
["High", "Excellent", "Excellent", "Strong", "Yes"],
["Low", "Poor", "Weak", "Weak", "No"],
["Medium", "Average", "Good", "Medium", "Yes"],
["High", "Good", "Good", "Strong", "Yes"]
], columns=columns)

# Build and display tree


tree = id3(data)
print("\nDecision Tree:")
print_tree(tree)
Output:

C 4.5
C4.5, developed by Ross Quinlan, is an extension of ID3 with improvements.
Key Improvements in C4.5
1. Handles both categorical and numerical data
o If a numerical feature is selected, it finds the best threshold (e.g., age > 30).
2. Uses Gain Ratio instead of Information Gain
o Gain Ratio solves the bias in Information Gain by normalizing it.
o Formula:
Gain Ratio=Information Gain/Split Information
o Split Information prevents the algorithm from favoring attributes with many
unique values.
3. Handles missing values
o It assigns probabilities for missing values.
4. Pruning to reduce overfitting
o Uses post-pruning, removing branches that add little value.

CODE
import numpy as np

import pandas as pd

from collections import Counter

def entropy(data):

labels = data.iloc[:, -1] # Assuming last column is the target

label_counts = Counter(labels)

total = len(labels)

ent = -sum((count/total) * np.log2(count/total) for count in label_counts.values())

return ent

def split_info(data, feature):

values = data[feature].unique()

total = len(data)

split_ent = -sum((len(subset)/total) * np.log2(len(subset)/total) for value in values

if len(subset := data[data[feature] == value]) > 0)

return split_ent

def gain_ratio(data, feature):

gain = information_gain(data, feature)

split = split_info(data, feature)

ratio = gain / split if split != 0 else 0

print(f"Split Info for {feature}: {split:.4f}")

print(f"Gain Ratio for {feature}: {ratio:.4f}")


return ratio

def information_gain(data, feature):

total_entropy = entropy(data)

values = data[feature].unique()

weighted_entropy = sum((len(subset)/len(data)) * entropy(subset) for value in values

if len(subset := data[data[feature] == value]) > 0)

gain = total_entropy - weighted_entropy

print(f"Information Gain for {feature}: {gain:.4f}")

return gain

def best_feature(data):

features = data.columns[:-1] # Exclude target column

return max(features, key=lambda feature: gain_ratio(data, feature))

def c45(data, tree=None, depth=0):

labels = data.iloc[:, -1]

if len(set(labels)) == 1:

return labels.iloc[0] # Pure class

if len(data.columns) == 1:

return labels.mode()[0] # Majority class

best_feat = best_feature(data)

print(f"\nBest Feature at depth {depth}: {best_feat}")

if tree is None:

tree = {}

tree[best_feat] = {}

for value in data[best_feat].unique():

subset = data[data[best_feat] == value].drop(columns=[best_feat])

tree[best_feat][value] = c45(subset, depth=depth+1)

return tree

def print_tree(tree, indent=""):

if not isinstance(tree, dict):


print(indent + "-> " + str(tree))

return

for key, subtree in tree.items():

print(indent + str(key))

for value, subsubtree in subtree.items():

print(indent + f" {value}:")

print_tree(subsubtree, indent + " ")

columns = ["CGPA", "Interactiveness", "Practical Knowledge", "Skills", "Placed"]

data = pd.DataFrame([

["High", "Good", "Excellent", "Strong", "Yes"],

["Low", "Poor", "Weak", "Weak", "No"],

["Medium", "Average", "Good", "Medium", "Yes"],

["High", "Good", "Good", "Strong", "Yes"],

["Medium", "Average", "Average", "Medium", "No"],

["Low", "Poor", "Weak", "Weak", "No"],

["High", "Excellent", "Excellent", "Strong", "Yes"],

["Medium", "Good", "Good", "Medium", "Yes"],

["Low", "Average", "Poor", "Weak", "No"],

["Medium", "Good", "Average", "Medium", "Yes"],

["High", "Excellent", "Excellent", "Strong", "Yes"],

["Low", "Poor", "Weak", "Weak", "No"],

["Medium", "Average", "Good", "Medium", "Yes"],

["High", "Good", "Good", "Strong", "Yes"]

], columns=columns)

# Build and display C4.5 decision tree

tree = c45(data)

print("\nC4.5 Decision Tree:")

print_tree(tree)

Output:
CART
CART, developed by Breiman et al., is another decision tree algorithm. Unlike ID3 and C4.5,
it:
 Works for both classification and regression.
 Uses the Gini Index (instead of entropy) to find the best split.
 If Gini = 0, the node is pure (only one class present).
 If Gini = 1, the node is maximally impure.
Splitting is done in a binary way (each node splits into two branches only).
Example: Instead of splitting on Color (Red, Blue, Green), CART creates binary splits like Color
= Red?.
Regression Trees use Mean Squared Error (MSE) instead of Gini.

CODE
import numpy as np
import pandas as pd
from collections import Counter

def gini_index(data):
labels = data.iloc[:, -1]
label_counts = Counter(labels)
total = len(labels)
gini = 1 - sum((count/total) ** 2 for count in label_counts.values())
return gini

def gini_split(data, feature):


values = data[feature].unique()
total = len(data)
weighted_gini = 0
for value in values:
subset = data[data[feature] == value]
if len(subset) > 0:
gini_val = gini_index(subset)
print(f"Gini index for {feature} = {value}: {gini_val:.4f}")
weighted_gini += (len(subset)/total) * gini_val
return weighted_gini

def best_feature_cart(data):
features = data.columns[:-1]
return min(features, key=lambda feature: gini_split(data, feature))

def cart(data, tree=None, depth=0):


labels = data.iloc[:, -1]
if len(set(labels)) == 1:
return labels.iloc[0]
if len(data.columns) == 1:
return labels.mode()[0]
best_feat = best_feature_cart(data)
print(f"\nBest Feature at depth {depth}: {best_feat}")
if tree is None:
tree = {}
tree[best_feat] = {}
for value in data[best_feat].unique():
subset = data[data[best_feat] == value].drop(columns=[best_feat])
tree[best_feat][value] = cart(subset, depth=depth+1)
return tree

def print_tree(tree, indent=""):


if not isinstance(tree, dict):
print(indent + "-> " + str(tree))
return
for key, subtree in tree.items():
print(indent + str(key))
for value, subsubtree in subtree.items():
print(indent + f" {value}:")
print_tree(subsubtree, indent + " ")

# Example dataset
columns = ["CGPA", "Interactiveness", "Practical Knowledge", "Skills", "Placed"]
data = pd.DataFrame([
["High", "Good", "Excellent", "Strong", "Yes"],
["Low", "Poor", "Weak", "Weak", "No"],
["Medium", "Average", "Good", "Medium", "Yes"],
["High", "Good", "Good", "Strong", "Yes"],
["Medium", "Average", "Average", "Medium", "No"],
["Low", "Poor", "Weak", "Weak", "No"],
["High", "Excellent", "Excellent", "Strong", "Yes"],
["Medium", "Good", "Good", "Medium", "Yes"],
["Low", "Average", "Poor", "Weak", "No"],
["Medium", "Good", "Average", "Medium", "Yes"],
["High", "Excellent", "Excellent", "Strong", "Yes"],
["Low", "Poor", "Weak", "Weak", "No"],
["Medium", "Average", "Good", "Medium", "Yes"],
["High", "Good", "Good", "Strong", "Yes"]
], columns=columns)

# Build and display CART decision tree


tree = cart(data)
print("\nCART Decision Tree:")
print_tree(tree)

Output:
EXPERIMENT -4
AIM
To implement a simple Artificial Neural Network (ANN) using the Backpropagation algorithm
from scratch using Python and NumPy, and test it on a suitable dataset like the XOR
problem.

THEORY
An Artificial Neural Network (ANN) is inspired by the structure of biological neurons. It
contains:
 Input Layer
 Hidden Layer(s)
 Output Layer
The Backpropagation algorithm is used to minimize error by updating weights using
gradients.

Dataset Used:
Iris dataset: A famous dataset consisting of 3 types of Iris flowers (Setosa, Versicolor,
Virginica) with 4 features:
 Sepal Length
 Sepal Width
 Petal Length
 Petal Width
We'll simplify it for binary classification:
 Class 0: Setosa
 Class 1: Versicolor
(We’ll ignore Virginica to keep it binary.)
 Input Layer: 4 neurons (4 features)
 Hidden Layer: 5 neurons
 Output Layer: 1 neuron (binary output)

CODE
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Activation and loss


def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

def binary_cross_entropy(y_true, y_pred):


epsilon = 1e-9
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# ANN class
class SimpleANN:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size)
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size)
self.b2 = np.zeros((1, output_size))

def forward(self, X):


self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = sigmoid(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
self.a2 = sigmoid(self.z2)
return self.a2

def backward(self, X, y, output, learning_rate=0.1):


m = y.shape[0]
dz2 = output - y
dW2 = np.dot(self.a1.T, dz2) / m
db2 = np.sum(dz2, axis=0, keepdims=True) / m

dz1 = np.dot(dz2, self.W2.T) * sigmoid_derivative(self.a1)


dW1 = np.dot(X.T, dz1) / m
db1 = np.sum(dz1, axis=0, keepdims=True) / m

# Update weights
self.W2 -= learning_rate * dW2
self.b2 -= learning_rate * db2
self.W1 -= learning_rate * dW1
self.b1 -= learning_rate * db1

def train(self, X, y, epochs=1000, learning_rate=0.1):


for i in range(epochs):
output = self.forward(X)
loss = binary_cross_entropy(y, output)
self.backward(X, y, output, learning_rate)
if i % 100 == 0:
print(f"Epoch {i}, Loss: {loss:.4f}")

def predict(self, X):


output = self.forward(X)
return (output > 0.5).astype(int)

# Load dataset
iris = load_iris()
X = iris.data[:100] # Only Setosa and Versicolor
y = iris.target[:100].reshape(-1, 1) # 0 or 1

# Preprocessing
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,
random_state=42)

# Create and train model


model = SimpleANN(input_size=4, hidden_size=5, output_size=1)
model.train(X_train, y_train, epochs=1000, learning_rate=0.1)

# Test accuracy
predictions = model.predict(X_test)
accuracy = np.mean(predictions == y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

# 🔍 Predict on user input


print("\n--- Predict a New Sample ---")
sample = np.array([[5.1, 3.5, 1.4, 0.2]]) # Example: likely Setosa
sample_scaled = scaler.transform(sample)
result = model.predict(sample_scaled)
print("Predicted Class:", "Setosa" if result[0][0] == 0 else "Versicolor")

Output
EXPERIMENT – 5

AIM
To implement the K-Nearest Neighbors (K-NN) algorithm from scratch in Python and use it
to classify data points from the Iris dataset. Display both correct and wrong predictions.

THEORY
 K-NN is a lazy, instance-based learning algorithm.
 For each test point, it finds the K nearest points in the training set using Euclidean
distance, and predicts the most common class among those neighbors.
Dataset Used:
Wine dataset (from sklearn.datasets)
 178 samples
 13 features (like alcohol, magnesium, etc.)
 3 classes (0, 1, 2) — different wine cultivars

CODE
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from collections import Counter

# Euclidean Distance Function


def euclidean_distance(x1, x2):
return np.sqrt(np.sum((x1 - x2) ** 2))

# KNN Class
class KNN:
def __init__(self, k=3):
self.k = k

def fit(self, X_train, y_train):


self.X_train = X_train
self.y_train = y_train

def predict(self, X_test):


predictions = []
for x in X_test:
distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
k_indices = np.argsort(distances)[:self.k]
k_labels = [self.y_train[i] for i in k_indices]
most_common = Counter(k_labels).most_common(1)[0][0]
predictions.append(most_common)
return np.array(predictions)

# Load Wine Dataset


data = load_wine()
X, y = data.data, data.target

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,
random_state=42)

# Train the model


model = KNN(k=5)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Evaluate model
print("Correct Predictions:")
for i in range(len(y_test)):
if predictions[i] == y_test[i]:
print(f"Sample {i}: Predicted = {predictions[i]}, Actual = {y_test[i]} ✅")

print("\nWrong Predictions:")
for i in range(len(y_test)):
if predictions[i] != y_test[i]:
print(f"Sample {i}: Predicted = {predictions[i]}, Actual = {y_test[i]} ❌")

# Take a sample input for prediction


sample_input = [13.0, 2.3, 2.4, 15.6, 100.0, 2.8, 2.5, 0.3, 1.9, 5.0, 1.0, 3.0, 1000.0]
sample_scaled = scaler.transform([sample_input]) # Normalize like training data
sample_prediction = model.predict(sample_scaled)
print(f"\n📌 Predicted class for sample input = {sample_prediction[0]}")

OUTPUT

You might also like