KJD ML File
KJD ML File
Objective:
To become familiar with the basics of Python programming by
completing the following tasks:
Related Theory:
1
num1 = 5
num2 = 10
Output:
The sum of 5 and 10 is 15
1.
number = 7
if number % 2 == 0:
print(f"{number} is even.")
else:
print(f"{number} is odd.")
Output:
7 is odd.
2.
Using Functions:
# Program to illustrate the use of functions
def greet(name):
2
print(greet("Alice"))
Output:
Hello, Alice!
3.
student = {
"name": "John",
"age": 20,
Output:
4.
String Operations:
3
print(message.replace("Python", "World")) # Replace
a substring
Output:
HELLO, PYTHON!
hello, python!
Hello, World!
Conclusion:
Through these exercises, a range of fundamental programming concepts
was explored, including arithmetic operations, conditional logic,
functions, data structures, and string manipulation. These basic
concepts are crucial for progressing to more advanced Python
programming.
4
Experiment-2
Objective:
To understand and apply various NumPy functions.
Related Theory:
import numpy as np
# Creating a 2D array
print("2D Array:")
print(array)
Output:
2D Array:
[[1 2 3]
5
[4 5 6]]
1.
print(f"Shape: {array.shape}")
print(f"Size: {array.size}")
Output:
Shape: (2, 3)
Size: 6
2.
print("Sliced Array:")
print(sliced_array)
6
print(row)
Output:
Sliced Array:
[2 3]
[1 2 3]
[4 5 6]
3.
NumPy Operations:
# Performing operations on the array
sum_array = np.sum(array)
mean_array = np.mean(array)
Output:
Sum of array: 21
Conclusion:
This experiment provided a detailed exploration of NumPy's capabilities,
demonstrating its efficiency in numerical computation and data
manipulation. By implementing a range of functions, essential concepts
such as array creation, indexing, slicing, broadcasting, and mathematical
operations were covered, highlighting NumPy’s critical role in scientific
computing.
7
Experiment -03
RELATED THEORY
(ii) Matplothb tries to make easy things easy and hard things possible.
You can generate plots, histograms, power spectra, bar charts, error
charts, scatterplots, etc. with just a few lines of code. For examples, see
the sample plots and thumbnail gallery.
#Code:
# Import necessary libraries
import pandas as pd
8
data = {
df = pd.DataFrame(data)
plt.figure(figsize=(8, 6))
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
plt.figure(figsize=(10, 6))
9
plt.title('Multiple Line Plot: Sales, Profit, and Expenses Over the
Years')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.legend()
plt.grid(True)
plt.show()
plt.figure(figsize=(10, 6))
plt.grid(True)
plt.show()
plt.figure(figsize=(10, 6))
plt.grid(True)
plt.show()
plt.figure(figsize=(8, 6))
10
sns.heatmap(df[['Sales', 'Profit', 'Expenses']].corr(), annot=True,
cmap='coolwarm', linewidths=0.5)
plt.show()
Output:
CONCLUSION :
In the above program we used matplotlib and seaborn libraries to plot various
visualizations. These visualizations include plotting line plot, bar chart (both
horizontal and vertical) and heatmaps. Several customization options have been
displayed and discussed in the demonstrations above.
11
Experiment-04
Related Theory:
OpenCV: OpenCV (Open Source Computer Vision Library) is a library of programming
functions primarily aimed at real-time computer vision. It provides various
functionalities to manipulate and process images and videos.
#Code:
try:
# Open the URL and read the image as a byte array
resp = urllib.request.urlopen(req) # Use the request object
image_array = np.asarray(bytearray(resp.read()), dtype="uint8")
# Decode the byte array to an image
img = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
return img
except urllib.error.HTTPError as e:
print(f"Error: Unable to load image from URL: {e}")
return None
12
# Check if the image was successfully loaded
if img_original is None:
print("Error: Unable to load the image from URL.")
else:
# Step 2: Process the image (Convert it to grayscale for
demonstration)
img_gray = cv2.cvtColor(img_original, cv2.COLOR_BGR2GRAY)
Output:
13
serves as the foundation for many image processing tasks that can be
accomplished using OpenCV.
14
Experiment-05:
OBJECTIVE: To Implement Linear Regression.
#Code:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
df = pd.DataFrame(data)
15
X_train_sales, X_test_sales, y_train, y_test =
train_test_split(X_sales, y, test_size=0.2, random_state=42)
X_train_all, X_test_all, _, _ = train_test_split(X_all, y,
test_size=0.2, random_state=42)
16
plt.figure(figsize=(8, 6))
plt.scatter(df['Sales'], y, color='blue', label='Actual')
plt.scatter(df['Sales'], multiple_lr.predict(df[['Sales',
'Expenses']]), color='green', label='Predicted')
plt.title('Multiple Linear Regression: Sales, Expenses vs Profit')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.legend()
plt.grid(True)
plt.show()
17
print(f"Polynomial Regression (Degree 2) - MSE:
{mean_squared_error(y_test_poly, y_pred_poly):.2f}, R2:
{r2_score(y_test_poly, y_pred_poly):.2f}")
Output:
18
Experiment-06
Objective: To Implement Logistic Regression.
#Code:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
# Generate features
X1 = np.random.normal(0, 1, 500) # Feature 1
X2 = np.random.normal(0, 1, 500) # Feature 2
# Create a DataFrame
df = pd.DataFrame({'Feature1': X1, 'Feature2': X2, 'Label': Y})
19
# Step 3: Split the dataset into training and testing sets (80%
training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Confusion Matrix:\n{conf_matrix}')
20
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend(title='Label')
plt.show()
Output:
Accuracy: 0.99
Confusion Matrix:
[[42 1]
[ 0 57]]
Classification Report:
precision recall f1-score support
21
Experiment-07
Objective: To process data using Pandas.
#Code:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
# Create a DataFrame
df = pd.DataFrame(data)
22
print("\nData Types:")
print(df.dtypes) # Data types
print("\nInitial Data:")
print(df)
# Fill missing values for 'Age' with the mean age and 'Cholesterol'
with the median
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Cholesterol'].fillna(df['Cholesterol'].median(), inplace=True)
scaler = StandardScaler()
df[['Age', 'Cholesterol']] = scaler.fit_transform(df[['Age',
'Cholesterol']])
23
df = df[(z_scores < 3).all(axis=1)]
# Split the dataset into training and testing sets (80% training, 20%
testing)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
Output:
Data Summary:
Age Cholesterol Has_Disease
count 13.000000 13.000000 15.000000
mean 55.384615 231.538462 0.600000
std 20.151669 22.673830 0.507093
min 25.000000 200.000000 0.000000
25% 40.000000 210.000000 0.000000
50% 55.000000 230.000000 1.000000
75% 70.000000 250.000000 1.000000
max 90.000000 270.000000 1.000000
Data Types:
Age float64
Gender object
Cholesterol float64
Has_Disease int64
dtype: object
Initial Data:
Age Gender Cholesterol Has_Disease
0 25.0 Male 200.0 0
1 30.0 Female 240.0 1
2 35.0 Female 230.0 0
3 NaN Male 210.0 1
4 40.0 Male 220.0 0
5 45.0 Female 250.0 1
6 50.0 Male NaN 1
7 55.0 Female 240.0 0
8 60.0 Male 230.0 1
24
9 65.0 Female 200.0 1
10 70.0 Male 210.0 0
11 75.0 Female NaN 0
12 80.0 Male 250.0 1
13 NaN Female 260.0 1
14 90.0 Male 270.0 1
Missing Values:
Age 2
Gender 0
Cholesterol 2
Has_Disease 0
dtype: int64
Conclusion:
we conclude that Panda. has tome ;election methods which you
can use to stge and dice the douses based on your queries. It
inaptly helps in the following data-processing tasks: • Deal
with missing Mu • Add default values • Remove incomplete rows
• Deal with arm prune eoluvm •Normtlite data types •Channt cmc
• Rename columns
25
Experiment-08
Objective: To implement KNN Algorithm.
RELATED THEORY:
The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each
other. KNN algorithm: Load the data 1. Initialize K to your
chosen number of neighbors 2. For each example in the data a.
Calculate the distance between the query example and the
current example from the data. b. Add the distance and the
index of the example to an ordered collection 3. Sort the
ordered collection of distances and indices from smallest to
largest (in ascending order) by the distances 4. Pick the
first K entries from the sorted collection 5. Get the labels
of the selected K entries 6. If regression, return the mean of
the K labels 7. If classification, return the mode of the K
labels .
Code:
import numpy as np
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
26
# Step 3: Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
Output:
27
28
Experiment-09
Objective: To classify data using SVM.
RELATED THEORY:
The objective of the support vector machine algorithm isto
find a hyperplane in an Ndimensional space (N- the no. of
features) that distinctly classifies the data points.
Code:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import pandas as pd
29
svm_model = SVC(kernel='linear') # You can choose other kernels like
'rbf', 'poly', etc.
svm_model.fit(X_train, y_train)
Output:
30
Experiment-10
Objective: To implement neural network.
RELATED THEORY:
Neural networks, a beautiful biologically-inspired programming
paradigm which enables a computer to learn from observational
data. An Artificial Neural Network is based on a collection of
connected units or nodes called artificial neurons. ANNs have
been used on a variety of tasks, including computer vision,
speech recognition, machine translation, social network
filtering, playing board and video games and medical
diagnosis.
Code:
import numpy as np
import matplotlib.pyplot as plt
# Helper functions
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
def relu(x):
return np.maximum(0, x)
31
def relu_derivative(x):
return np.where(x <= 0, 0, 1)
def softmax(x):
exps = np.exp(x - np.max(x, axis=1, keepdims=True))
return exps / np.sum(exps, axis=1, keepdims=True)
32
def train(self, X, y, epochs=1000, learning_rate=0.01):
for epoch in range(epochs):
output = self.forward(X)
self.backward(X, y, output, learning_rate)
if epoch % 100 == 0:
loss = -np.mean(y * np.log(output))
print(f"Epoch {epoch}, Loss: {loss}")
# Accuracy
accuracy = np.mean(predictions == y)
print(f"Training Accuracy: {accuracy * 100:.2f}%")
33
# Visualizing the decision boundary
def plot_decision_boundary(X, y, model):
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
grid = np.c_[xx.ravel(), yy.ravel()]
Z = model.predict(grid)
Z = Z.reshape(xx.shape)
Output:
34
Experiment-11
Objective: To implement Decision Tree Algorithm on breast
cancer data to predict whether a person is having cancer or
not.
RELATED THEORY:
Decision Tree is the most powerful and popular tool for
classification and prediction. A Decision tree is a
flowchart-like tree structure, where each internal node
denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (terminal node) holds
a class label. The strengths of decision tree methods are:
1. Decision trees are able to generate understandable rules.
2. Decision trees perform classification without requiring
much computation.
3. Decision trees are able to handle both continuous and
categorical variables.
4. Decision trees provide a clear indication of which fields
are most important for prediction or classification.
The weaknesses of decision tree methods :
1. Decision trees are less appropriate for estimation tasks
where the goal is to predict the value of a continuous
attribute.
2. Decision trees are prone to errors in classification
problems with many classes and a relatively small number of
training examples.
3. Decision tree can be computationally expensive to train.
The process of growing a decision tree is computationally
expensive. At each node, each candidate splitting field must
be sorted before its best split can be found. In some
algorithms, combinations of fields are used and a search must
be made for optimal combining weights. Pruning algorithms can
also be expensive since many candidate sub-trees must be
formed and compared.
Code:
import numpy as np
import pandas as pd
35
'Feature2': [1.784783929, 1.169761413, 2.81281357, 2.61995032,
2.209014212, 3.162953546, 3.339047188, 0.476683375, 3.234550982,
3.319983761],
'Label': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
36
best_index, best_value, best_score, best_groups =
index, row[index], gini, groups
return {'index': best_index, 'value': best_value, 'groups':
best_groups}
37
else:
return node['left']
else:
if isinstance(node['right'], dict):
return predict(node['right'], row)
else:
return node['right']
Output:
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
38
Experiment-12
OBJECTIVE: Implementation of Random Forest in Python
RELATED THEORY: Random forest, like its name implies, consists of a
large number of individual decision trees that operate as an ensemble.
Each individual tree in the random forest spits out a class prediction
and the class with the most votes becomes our model’s prediction.
Code:
import numpy as np
import pandas as pd
import random
39
for class_val in classes:
p = class_counts.count(class_val) / size
score += p * p
gini += (1.0 - score) * (size / n_instances)
return gini
40
if len(right) <= min_size:
node['right'] = to_terminal(right)
else:
node['right'] = get_best_split(right, n_features)
split(node['right'], max_depth, min_size, depth+1, n_features)
41
# Make a prediction with a random forest
def bagging_predict(forest, row):
predictions = [predict(tree, row) for tree in forest]
return max(set(predictions), key=predictions.count)
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=0.0, Predicted=0.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
Expected=1.0, Predicted=1.0
42
Experiment 13
OBJECTIVE: Implementation of K Means Clustering algorithm
RELATED THEORY: K-means is an unsupervised learning method for
clustering data points. The algorithm iteratively divides data points
into K clusters by minimizing the variance in each cluster. Here, we
will show you how to estimate the best value for K using the elbow
method, then use K-means clustering to group the data points into
clusters. First, each data point is randomly assigned to one of the K
clusters. Then, we compute the centroid (functionally the center) of
each cluster, and reassign each data point to the cluster with the
closest centroid. We repeat this process until the cluster assignments
for each data point are no longer changing. K-means clustering requires
us to select K, the number of clusters we want to group the data into.
The elbow method lets us graph the inertia (a distance-based metric)
and visualize the point at which it starts decreasing linearly. This
point is referred to as the "elbow" and is a good estimate for the best
value for K based on our data.
Code:
import numpy as np
import matplotlib.pyplot as plt
# Sample data
data = np.array([
[5.9, 3.2],
[4.6, 2.9],
[6.2, 2.8],
[4.7, 3.2],
[5.5, 4.2],
[5.0, 3.0],
[4.9, 3.1],
[6.7, 3.1],
[5.1, 3.8],
[6.0, 3.0]
])
43
distances = [np.linalg.norm(point - centroid) for centroid in
centroids]
cluster = np.argmin(distances)
clusters.append(cluster)
return np.array(clusters)
# K-means algorithm
def k_means(data, k, max_iterations=100, tolerance=1e-4):
centroids = initialize_centroids(data, k)
for i in range(max_iterations):
clusters = assign_clusters(data, centroids)
new_centroids = update_centroids(data, clusters, k)
# Parameters
k = 3 # Number of clusters
# Run K-means
centroids, clusters = k_means(data, k)
44
cluster_points = data[clusters == i]
plt.scatter(cluster_points[:, 0], cluster_points[:, 1], s=30,
color=colors[i], label=f'Cluster {i+1}')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, color='black',
marker='X', label='Centroids')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
45