0% found this document useful (0 votes)
30 views11 pages

Data Science Machine Leraning222

Uploaded by

Radhey Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Data Science Machine Leraning222

Uploaded by

Radhey Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Science Machine Learning


MAHAMAYA POLYTECHNIC OF INFORMATION
TECHNOLOGY (AMROHA).

Submitted to

Dr Jaya Singh

For

DIPLOMA OF TECHNOLOGY

In

INFORMATION TECH NOLOGY



By ------------

Name Roll No.


ANKIT SRIVASTAV : E20113835600012
Diploma final year VIth Semester


1
Practical-01
Q-1 Write a program in Python to implement the Decision tree Algorithm
Decision Tree is one of the most powerful and popular algorithm. Decision-tree algorithm
falls under the category of supervised learning algorithms. It works for both continuous as
well as categorical output variables.

Types of Decision Tree Algorithms

There are two types of decision trees. They are categorized based on the type of the target
variable they have. If the decision tree has a categorical target variable, then it is called a
‘categorical variable decision tree’. Similarly, if it has a continuous target variable, it is called
a ‘continuous variable decision tree’.
1. # Python program to implement decision tree algorithm and plot the tree
2.
3. # Importing the required libraries
4. import pandas as pd
5. import numpy as np
6. import matplotlib.pyplot as plt
7. from sklearn import metrics
8. import seaborn as sns
9. from sklearn.datasets import load_iris
10. from sklearn.model_selection import train_test_split
11. from sklearn import tree
12.
13. # Loading the dataset
14. iris = load_iris()
15.
16. #converting the data to a pandas dataframe
17. data = pd.DataFrame(data = iris.data, columns = iris.feature_names)
18.
19. #creating a separate column for the target variable of iris dataset
20. data['Species'] = iris.target
21.
22. #replacing the categories of target variable with the actual names of the species
23. target = np.unique(iris.target)
24. target_n = np.unique(iris.target_names)
25. target_dict = dict(zip(target, target_n))
26. data['Species'] = data['Species'].replace(target_dict)
27.
28. # Separating the independent dependent variables of the dataset
29. x = data.drop(columns = "Species")
30. y = data["Species"]
31. names_features = x.columns

2
32. target_labels = y.unique()
33.
34. # Splitting the dataset into training and testing datasets
35. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 93)
36.
37. # Importing the Decision Tree classifier class from sklearn
38. from sklearn.tree import DecisionTreeClassifier
39.
40. # Creating an instance of the classifier class
41. dtc = DecisionTreeClassifier(max_depth = 3, random_state = 93)
42.
43. # Fitting the training dataset to the model
44. dtc.fit(x_train, y_train)
45.
46. # Plotting the Decision Tree
47. plt.figure(figsize = (30, 10), facecolor = 'b')
48. Tree = tree.plot_tree(dtc, feature_names = names_features, class_names = target_labels, rounded = Tr
ue, filled = True, fontsize = 14)
49. plt.show()
50. y_pred = dtc.predict(x_test)
51.
52. # Finding the confusion matrix
53. confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
54. matrix = pd.DataFrame(confusion_matrix)
55. axis = plt.axes()
56. sns.set(font_scale = 1.3)
57. plt.figure(figsize = (10,7))
58.
59. # Plotting heatmap
60. sns.heatmap(matrix, annot = True, fmt = "g", ax = axis, cmap = "magma")
61. axis.set_title('Confusion Matrix')
62. axis.set_xlabel("Predicted Values", fontsize = 10)
63. axis.set_xticklabels([''] + target_labels)
64. axis.set_ylabel( "True Labels", fontsize = 10)
65. axis.set_yticklabels(list(target_labels), rotation = 0)
66. plt.show()

3
Practical-02
Q-01 Write a program in python to implement the K-means Algorithm
K-means is an unsupervised learning method for clustering data points. The algorithm
iteratively divides data points into K clusters by minimizing the variance in each cluster
we will show you how to estimate the best value for K using the elbow method, then use K-
means clustering to group the data points into clusters.
work
First, each data point is randomly assigned to one of the K clusters. Then, we compute the
centroid (functionally the center) of each cluster, and reassign each data point to the cluster
with the closest centroid. We repeat this process until the cluster assignments for each data
point are no longer changing.

K-means clustering requires us to select K, the number of clusters we want to group the data
into. The elbow method lets us graph the inertia (a distance-based metric) and visualize the
point at which it starts decreasing linearly. This point is referred to as the "eblow" and is a
good estimate for the best value for K based on our data.
Program
Import matplotlib.pyplot as plt

x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]


y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

plt.scatter(x, y)
plt.show()
Output

4
import numpy as np
import matplotlib.pyplot as plt

# Import the data


data = np.loadtxt("data.csv", delimiter=",")

# Choose the number of clusters


k = 3

# Initialize the centroids randomly


centroids = data[np.random.randint(0, len(data), k)]

# Repeat until the centroids do not change


while True:

# Assign each data point to the closest centroid


distances = np.linalg.norm(data - centroids, axis=1)
labels = np.argmin(distances, axis=0)

# Update the centroids


new_centroids = np.array([np.mean(data[labels == i], axis=0) for i in
range(k)])

# If the centroids have not changed, stop


if np.all(centroids == new_centroids):
break

# Update the centroids


centroids = new_centroids

# Plot the data


plt.scatter(data[:, 0], data[:, 1], c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], c="black", marker="x")
plt.show()

5
Practicial-03

Q-1 Write a program in python to implement the Linear Regression


Regression

The term regression is used when you try to find the relationship between variables.

In Machine Learning, and in statistical modeling, that relationship is used to predict the
outcome of future events

Linear Regression

Linear regression uses the relationship between the data-points to draw a straight line through
all them.

This line can be used to predict future values.

Work

Python has methods for finding a relationship between data-points and to draw a line of linear
regression. We will show you how to use these methods instead of going through the
mathematic formula.

In the example below, the x-axis represents age, and the y-axis represents speed. We have
registered the age and speed of 13 cars as they were passing a tollbooth. Let us see if the data
we collected could be used in a linear regression:

Code
import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)
plt.show()

6
Outpuput

Import scipy and draw the line of Linear Regression:

import matplotlib.pyplot as plt


from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

Output

7
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load the data


data = np.loadtxt("data.csv", delimiter=",")

# Split the data into features and target


features = data[:, :-1]
target = data[:, -1]

# Create the linear regression model


model = LinearRegression()

# Fit the model to the data


model.fit(features, target)

# Predict the values of the target variable for the given features
predictions = model.predict(features)

# Plot the data and the regression line


plt.scatter(features, target, color="blue")
plt.plot(features, predictions, color="red")
plt.show()

8
Practical-04
Q1 Write a program in python to implement the K-NN
It is the learning where the value or result that we want to predict is within the training data
(labeled data) and the value which is in data that we want to study is known as Target or
Dependent Variable or Response Variable.
All the other columns in the dataset are known as the Feature or Predictor Variable or
Independent Variable.
Supervised Learning is classified into two categories:
1. Classification: Here our target variable consists of the categories.
2. Regression: Here our target variable is continuous and we usually try to find out the line
of the curve.
k-nearest neighbor algorithm:
This algorithm is used to solve the classification model problems. K-nearest neighbor or K-
NN algorithm basically creates an imaginary boundary to classify the data. When new data
points come in, the algorithm will try to predict that to the nearest of the boundary line.
Therefore, larger k value means smother curves of separation resulting in less complex
models. Whereas, smaller k value tends to overfit the data and resulting in complex models.
Note: It’s very important to have the right k-value when analyzing the dataset to avoid
overfitting and underfitting of the dataset.
Using the k-nearest neighbor algorithm we fit the historical data (or train the model) and
predict the future.
1. The k-nearest neighbor algorithm is imported from the scikit-learn package.
2. Create feature and target variables.
3. Split data into training and test data.
4. Generate a k-NN model using neighbor’s value.
5. Train or fit the data into the model.
6. Predict the future.

How does it work?

K is the number of nearest neighbors to use. For classification, a majority vote is used to
determined which class a new observation should fall into. Larger values of K are often more
robust to outliers and produce more stable decision boundaries than very small values
(K=3 would be better than K=1, which might produce undesirable results.

Now we fit the KNN algorithm with K=1:

from sklearn.neighbors import KNeighborsClassifier

data = list(zip(x, y))


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(data, classes)

And use it to classify a new data point:

9
Code

new_x = 8
new_y = 21
new_point = [(new_x, new_y)]

prediction = knn.predict(new_point)

plt.scatter(x + [new_x], y + [new_y], c=classes + [prediction[0]])


plt.text(x=new_x-1.7, y=new_y-0.7, s=f"new point, class: {prediction[0]}")
plt.show()

Output

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

# Load the data


data = np.loadtxt("data.csv", delimiter=",")

# Split the data into features and target


features = data[:, :-1]
target = data[:, -1]

10
# Choose the value of K
k = 5

# Create the KNN model


model = KNeighborsClassifier(n_neighbors=k)

# Fit the model to the data


model.fit(features, target)

# Predict the class of the new data point


new_data = np.array([1, 2, 3])
prediction = model.predict(new_data)

# Print the prediction


print(prediction)

# Plot the data and the decision boundary


plt.scatter(features, target, c=target, cmap="rainbow")
plt.plot(features, model.predict_proba(features)[:, 1], color="black")
plt.show()

11

You might also like