0% found this document useful (0 votes)
18 views28 pages

ML (Lab Programs)

The document provides a comprehensive guide on setting up Python and essential libraries such as NumPy, Pandas, and Scikit-learn for machine learning. It includes detailed installation steps, sample programs for data manipulation, visualization, and machine learning tasks, as well as methods for handling missing data and encoding categorical variables. The document serves as a practical resource for beginners and practitioners in data science and machine learning.

Uploaded by

itadeekshu19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views28 pages

ML (Lab Programs)

The document provides a comprehensive guide on setting up Python and essential libraries such as NumPy, Pandas, and Scikit-learn for machine learning. It includes detailed installation steps, sample programs for data manipulation, visualization, and machine learning tasks, as well as methods for handling missing data and encoding categorical variables. The document serves as a practical resource for beginners and practitioners in data science and machine learning.

Uploaded by

itadeekshu19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Machine Learning

Lab Programs

1. Install and set-up Python and Pardas. and essential libmates Like
Numpy

● Installation of Python and Set-up Python

Search Python in the Google search bar “https://www.python.org/downloads/


● ”and download. The latest version of Python from the Google.

● Python latest version, Python 3.12.1 64-bit/32-bit and download the


executable fide.
Downloading the Python Installer

● Open the .exe file, such as Python 3.12.1 and 64, then launch the
python installer.

● Choose the option to install the launcher for all users by checking
the corresponding checkbox,
● verify the python installation in windows

Go to Python integrated Development Environment [IDLE] in windows search


bar, you can see the “IDLE (python3.12.64-bit)”open the IDLE screen itself you
can see the version.

This gives the com formation of successfully installation of Python.


Installation of essential Packages Numpy and Pandas.

a. Install numpy package:


NumPy is an open-source Python library that facilitates efficient
numerical operations on large quantities of data. There are a few functions
that exist in NumPy that we use on pandas DataFrames.
It is defined as a Python package used for performing the various
numerical computations and processing of the multidimensional and single-
dimensional array elements. The calculations using Numpy arrays are faster
than the normal Python array. It is also capable of handling a vast amount of
data
● Steps to install Numpy is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install numpy:


By typing the command
Pip install numpy

b. Install pandas package:


Pandas is a very popular library for working with data (its goal is to be
the most powerful and flexible open-source tool, and in our opinion, it has
reached that goal). DataFrames are at the center of pandas. A DataFrame
is structured like a table or spreadsheet. The rows and the columns both
have indexes, and you can perform operations on rows or columns
separately. It can perform five significant steps required for processing and
analysis of data irrespective of the origin of the data, i.e., load,
manipulate, prepare, model, and analyze.

● Steps to install pandas is


Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install pandas:


By typing the command
Pip install pandas

Simple program to show the installed library versions to provide conformation of


successful installing.
import
numpy
import
pandas
print("numpy library version is: ")
print(numpy. version ) #please type two underscore symbols.
print("numpy library is successfully installed")
print(" ")

print("pandas library
version is: ") print(pandas.
version )
print("pandas library is successfully installed")

Program 2 Introduce scikit-learn as a machine learning library.

Scikit-learn is a popular open-source machine learning library in Python that


offers a comprehensive set of tools and algorithms for data analysis,
modeling, and machine learning tasks. It is built on foundational libraries like
NumPy, SciPy, and Matplotlib. Scikit-learn provides a user-friendly and
efficient framework for both beginners and experts in the field of data
science.

Some key points to introduce scikit-learn as a machine learning library:

1. Comprehensive Machine Learning Library: Scikit-learn offers a wide


range of machine learning algorithms and tools for various tasks such
as classification, regression, clustering, dimensionality reduction, and
more.

2. User-Friendly and Easy to Use: It is designed with a user-friendly


interface and simple syntax, making it accessible for both beginners and
experienced machine learning practitioners.

3. Integration with Scientific Computing Libraries: Scikit-learn


integrates well with other scientific computing libraries in Python such
as NumPy, SciPy, and Matplotlib, providing a powerful environment for
machine learning tasks.

4. Extensive Documentation and Community Support: The library


comes with comprehensive documentation, tutorials, and examples to
help users understand and implement machine learning algorithms
effectively. Additionally, there is a vibrant community around scikit-learn
that provides support and contributions.

5. Efficient Implementation of Algorithms: Scikit-learn is built on top of


NumPy, SciPy, and Cython, which allows for efficient implementation of
machine learning algorithms and scalability to large datasets.

6. Support for Model Evaluation and Validation: The library provides


tools for model evaluation, hyperparameter tuning, cross-validation, and
performance metrics, enabling users to assess and improve the quality
of their machine learning models.

7. Flexibility and Customization: Scikit-learn offers flexibility for


customization and parameter tuning, allowing users to adapt algorithms
to their specific requirements and datasets.

8. Wide Adoption and Industry Usage: Due to its ease of use,


performance, and versatility, scikit-learn is widely adopted in academia,
research, and industry for various machine learning applications.

Overall, scikit-learn is a powerful and versatile machine learning library in


Python that empowers users to build and deploy machine learning models
efficiently for a wide range of tasks and applications.

Lab Program 3: Install and set up scikit-learn and other necessary


tools.
PIP is a package manager for Python, which means it allows you to
install and manage libraries and dependencies that are supplemental to the
standard library. (A package contains all the files you need for a module, and
modules are Python code libraries that you can include in your projects.) PIP3
is also a package manager, designed to replace PIP to solve few problems
caused by it. Latest versions of python 3.x allows the use of pips command for
installing python libraries.

Scikit-learn (Sklearn) Library:

Scikit-learn is the most useful machine learning library. It provides


modules for data analysis and statistical modelling. It provides a wide range
of efficient tools such as classification, regression, and clustering and
dimensionality reduction via a consistence interface in Python. This library,
which is largely written in Python, is built upon following essential libraries:
NumPy, Pandas, SciPy and Matplotlib libraries.

Install numpy library


● Steps to install Numpy is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install numpy:


By typing the command
Pip install numpy
● Steps to install pandas is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install pandas:


By typing the command
Pip install pandas

● Steps to install matplotlib is


Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install pandas:


By typing the command
Pip install matplotlib

● Steps to install scipy is


Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install pandas:


By typing the command
Pip install scipy

● Steps to install scikit-learn(sklearn) is


Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\

Step3: install pandas:


By typing the command
Pip install scikit-learn

Simple program to show the installed library versions to provide conformation of


successful installing.
import numpy
import pandas
import scipy
import matplotlib
import sklearn
print("numpy library version is: ")
print(numpy.__version__) #please type two underscore symbols.
print("numpy library is successfully installed")
print(" ")
print("pandas library version is: ")
print(pandas.__version__)
print("pandas library is successfully installed")
print(" ")
print("scipy library version is: ")
print(scipy.__version__)
print("scipy library is successfully installed")
print(" ")
print("matplotlib library version is: ")
print(matplotlib.__version__)
print("matplotlib library is successfully installed")
print(" ")
print("scikit-learn(sklearn) library version is: ")
print(sklearn.__version__)
print("sklearn library is successfully installed")

Lab Program 4: Write a program to Load and explore the dataset


of .CVS and excel files using pandas.
import pandas as pd

csv_file_path='C:\\ML_Projects\\sample_data.csv'
excel_file_path='C:\\ML_Projects\\sample_data.xlsx'

data_csv=pd.read_csv(csv_file_path)

print("CSV File data:")

print(data_csv)

data_excel=pd.read_excel(excel_file_path)

print("\nExcel File data:")

print(data_excel)

print("\n Data Descriptions:")

print("CSV data Decription:")

print(data_csv.describe())

print("\n Excel data Decription:")

print(data_excel.describe())

print("\n Datatypes in CSv files:")

print(data_csv.dtypes)

print("\n Datatypes in Excel files:")

print(data_excel.dtypes)
Output

CSV File data:

Name Age Score

0 Manoj 19 95

1 Dilip 20 97

2 Manjula 40 35

3 Rakesh 24 45

4 Kushal 22 80

Excel File data:

Name Course Sem

0 Rajesh BCA 1

1 Ramesh BCA 2

2 Swati BCOM 1

3 Florina BCOM 3

4 Pooja BBA 2

5 Raghu BBA 4

Data Descriptions:

CSV data Decription:

Age Score

count 5.000000 5.000000

mean 25.000000 70.400000

std 8.602325 28.736736

min 19.000000 35.000000


25% 20.000000 45.000000

50% 22.000000 80.000000

75% 24.000000 95.000000

max 40.000000 97.000000

Excel data Decription:

Sem

count 6.000000

mean 2.166667

std 1.169045

min 1.000000

25% 1.250000

50% 2.000000

75% 2.750000

max 4.000000

Datatypes in CSv files:

Name object

Age int64

Score int64

dtype: object

Datatypes in Excel files:

Name object

Course object

Sem int64

dtype: object
Lab Program 5: Write a program to visualize the dataset to gain insights using
Matplotlib or Seaborn by plotting scatter plots, bar charts.

import pandas as pd

import matplotlib.pyplot as plt

data= pd.read_csv('C:\\ML_Projects\\study_data.csv')

plt.figure(figsize=(14,7))

plt.subplot(1,2,1)

plt.scatter(data['Study Hours'], data['Exam Score'], color='cyan', edgecolor='k', alpha=0.7)

plt.title('Study Hours vs .Exam Scores')

plt.xlabel('Study Hours')

plt.ylabel('Exam Scores')

plt.grid(True)

bins=[0,2,4,6,8,10,12]

labels=['0-2', '2-4', '4-6', '6-8', '8-10', '10-12']

data['Study Hours Range']=pd.cut(data['Study Hours'], bins=bins, labels=labels, right=False)

grouped_data=data.groupby('Study Hours Range')['Exam Score'].mean()

plt.subplot(1,2,2)

grouped_data.plot(kind='bar', color='pink')

plt.title('Average Exam Score by Study Hour Range')

plt.xlabel('Study Hours Range')

plt.ylabel('Average Exam Scores')

plt.xticks(rotation=0)

plt.tight_layout()

plt.show()

output
Lab Program 6: Write a program to Handle missing data, encode
categorical variables, and perform feature scaling.
import pandas as pd

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import OneHotEncoder, StandardScaler

data={

'Age': [25, 30, None, 28, 35],

'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],

'Income': [50000, 60000, 45000, None, 70000]

}

df= pd.DataFrame(data)

#Handling missing data.

imputer = SimpleImputer(strategy='mean')

df[['Age', 'Income']] = imputer.fit_transform(df[['Age', 'Income']])

#Print data after handling missing values

print("Data after handling missing values:")

print(df)
#Encoding categorical variables

encoder = OneHotEncoder()

encoded_data = encoder.fit_transform(df[['Gender']]).toarray()

#Print data after categorical encoding

encoded_df= pd.DataFrame(encoded_data,
columns=encoder.get_feature_names_out(['Gender']))

print("\nData after categorical encoding:")

print(encoded_df)

scaler = StandardScaler()

scaled_data =scaler.fit_transform(df[['Age', 'Income']])

#Print data after feature scaling

scaled_df = pd.DataFrame(scaled_data, columns=['Scaled Age', 'Scaled Income'])

print("\nData after feature scaling:")

print(scaled_df)

Output
Lab Program 7: Write a program to implement a k-Nearest Neighbours (k-NN) classifier
using scikitlearn and Train the classifier on the dataset and evaluate its performance.

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

#Dummy student data: exam score 1, exam score 2, pass/fail (features)

X = np.array([[88, 75], [95, 90], [60, 50], [45, 30], [30, 48], [85, 95], [70, 60], [50, 55], [40, 45], [60,
70]])

y= np.array([1, 1, 0, 0, 0, 1, 1, 0, 0, 1]) #Binary classes for demonstration

#Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2,random_state=42)

#Initialize the K-NN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

#Train the classifier on the training data


knn.fit(X_train,y_train)

# Evaluate the classifier's performance

y_pred=knn.predict(X_test)

accuracy=accuracy_score(y_test, y_pred)

print("Accuracy on the test set: {:.2f}".format(accuracy))

#Take user input for exam scores

exam_score1 = float(input("Enter Exam Score 1: "))

exam_score2 = float(input("Enter Exam Score 2: "))

# Prepare the user input for prediction

user_input = np.array([[exam_score1, exam_score2]])

# Use the trained k-NN classifier to predict the outcome

predicted_outcome=knn.predict(user_input)

if predicted_outcome [0] == 1:

print("Based on the exam scores provided, the student is predicted to pass.")

else:

print("Based on the exam scores provided, the student is predicted to fail.")

OutPut:

Lab Program 08. Write a program to implement a linear regression model for regression
tasks and Train the model on a dataset.
#Regression Algorithm
import numpy as np
import matplotlib.pyplot as plt

X=np.array([18,17,26,19,27,31,14,29,32,26]) #Experince in months


Y=np.array([16000,11000,23000,23000,23000, 32000,15000, 33000, 32000,
32000]) #Salary
print("X-values are:")
print (X)
print("Y-values are:")
print (Y)
print()
#Find mean values of X and Y data.
mean_x=np.mean (X)
print (f"Mean of X is: {mean_x}")
mean_y=np.mean(Y)
print (f"Mean of Y is: {mean_y}")
print()
variance_x = np.var (X)
print (f"Variance of X is: {variance_x}")
covariance= (np.sum((X- mean_x) * (Y -mean_y)))/(len(X))
print (f"Covariance of X is: {covariance}")
print()
#Find a and b values.
a= covariance / variance_x
print("a =covariance / variance_x so, ")
print (f"a={a}")
b = mean_y-a* mean_x
print("b= mean_y-a *mean_x so, ")
print (f"b= {b}")
print()
#Predict Y- values to the existing X- values.
Y_pred=a* X + b
print (f"Regression Line: Y = {a:.2f} + {b:.2f}X")
print("Y-values obtained are =" , Y_pred)
print("And corresponding X- values are =" , X)
print()
plt.scatter (X, Y, label="Original Data")
plt.plot(X, Y_pred, color="red", label=f"Regression Line: Y = {a:.2f} + {b:.2f}X")
plt.xlabel("Experince")
plt.ylabel("Salary")
plt.legend()
plt.grid(True)
plt.show()

# Getting the Solution that is Y- value, for new data set that is X- value.
new_X = 7.5
new_Y=a* new_X + b
print()
print (f"Predict Y-value using= {a:.2f} + {b:.2f}X for new X- value= {new_X} ")
print (f"Predicted Y-value is =(new_Y:.2f) ")
Lab Program 09. Write a program to implement a decision tree classifier using
scikit-learn and visualize the decision tree and understand its splits.
from sklearn.tree import DecisionTreeClassifier, plot_tree
from matplotlib.pyplot import figure,show
import matplotlib.pyplot as plt

# Define some features and corresponding classifications


features = [[140,1],[130,0],[150,0],[170,1],[180,1],[100,0],[172,1]]
classifications = ["play","don't play","don't play","play","play","don't play","play",]

import numpy as np


features=np.array(features)
classifications=np.array(classifications)

# Create a decision tree classifier


clf = DecisionTreeClassifier()

# Train the classifier on the data


clf = clf.fit(features,classifications)

# Print the Prediction


predictions = clf.predict([[170,1]])

# Creaye a figure for Plotting the Tree


print("Decision Tree Classifier:")
print("Predict Class Label for New Instance is: [170,1]")
print("Class Label for New Instance is:",predictions[0])
plt.figure(figsize=(5,8))
plot_tree(clf,feature_names=["Temperature","Huminity"],class_names=classificatio
ns,filled=True,rounded=True)
plt.show()

Lab Program 10. Write a program to Implement K-Means clustering


and Visualize clusters.
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
data=[[1,1],[1.5,1.8], [5,8], [8,8],[1,0.6],[9,11]]
print("Considered data for K-Means clustering is:")
print(data)

print("Considered data as numpy list is:")


data=np.array(data)
print(data)

print("Assumed K-Value is:")


k=3
print(k)

print("K-Means object is given following value:")


Kmeans=kMeans(n_clusters=k, random_state=42, n_init=10)
print(Kmeans)

Kmeans.fit(data)
print("Integer labels provided to each data points are:")
labels=Kmeans.Labels_
print(labels)

print("Calculated centroid points are:")


centroids=Kmeans.cluster_centers_
print(Centroids)
plt.scatter(data[:,0], data[:,1], c=labels, cmap='viridis')
plt.scatter(centroids[:,0], centroids[:,1], s=60, marker='x', c='red')

plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("K-Means clustering (k=" + str(k) + ")")
plt.grid()
plt.show()

You might also like