0% found this document useful (0 votes)

44 views14 pages

Exercise and Experiment 3

Uploaded by

h8792670

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views14 pages

Exercise and Experiment 3

Uploaded by

h8792670

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

WORKING ON DATASET

Definition of Dataset:

 A Dataset is a set of data grouped into a collection with which machine learning/ AI
developers work to train the machine
 In a dataset, the rows represent the number of data points and the columns represent the
features of the Dataset.
 They are mostly used in fields like machine learning, business, and government to gain
insights, make informed decisions, or train algorithms.
 Datasets may vary in size and complexity and they mostly require cleaning and
preprocessing to ensure data quality and suitability for analysis or modeling.
 Datasets can be stored in multiple formats. The most common ones are CSV, Excel,
JSON(JavaScript Object Notation) , and zip files for large datasets such as image datasets.

Types of datasets:

1. Numerical Dataset: They include numerical data points that can be solved with equations.
These include temperature, humidity, marks and so on.
2. Categorical Dataset: These include categories such as colour, gender, occupation, games,
sports and so on.
3. Web Dataset: These include datasets created by calling APIs using HTTP requests and
populating them with values for data analysis. These are mostly stored in JSON (JavaScript
Object Notation) formats.
4. Time series Dataset: These include datasets between a period, for example, changes in
geographical terrain over time.
5. Image Dataset: It includes a dataset consisting of images. This is mostly used to
differentiate the types of diseases, heart conditions and so on.
6. Ordered Dataset: These datasets contain data that are ordered in ranks, for example,
customer reviews, movie ratings and so on.
7. Partitioned Dataset: These datasets have data points segregated into different members
or different partitions.
8. File-Based Datasets: These datasets are stored in files, in Excel as .csv, or .xlsx files.
9. Bivariate Dataset: In this dataset, 2 classes or features are directly correlated to each other.
For example, height and weight in a dataset are directly related to each other.
10. Multivariate Dataset: In these types of datasets, as the name suggests 2 or more classes
are directly correlated to each other. For example, attendance, and assignment grades are
directly correlated to a student’s overall grade.

Note : 70% of the data in the dataset is used for training whereas 30% for testing the
model
Python libraries

Numpy:
 NumPy is a Python library.
 NumPy is used for working with arrays.
 NumPy is short for "Numerical Python"
Eg :
import numpy as np # numpy is a library and “as” is alias name
arr = np.array([1, 2, 3, 4, 5])# array() function is used to create a elements of same type
print(arr)

Output :

Pandas:
 Pandas is a Python library used for working with data sets.
 It has functions for analyzing, cleaning, exploring, and manipulating data.
 Pandas allows us to analyze big data and make conclusions based on
statistical theories.
 Pandas can clean messy data sets, and make them readable and relevant.
The main functionality of pandas is

a) To read CSV(Comma Seperated Values) files

b) To check the correlation between two or more columns
c) Average value
d) Max value
e) Min value
f) Standard deviation

Let us consider one CSV file

Eg :

import pandas as pd

a= pd.read_csv(‘eg.csv’)

print(a.to_string())

# to_string() to print the entire DataFrame.

Output :
Matplotlib:

 Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.
 Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported
under the plt alias
Eg :
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Output :
Installing Jupyter Notebook
The Jupyter Notebook is the original web application for creating and sharing
computational documents. It offers a simple, streamlined, document-centric experience.
Procedure to install JupyterNotebook

Install JupyterLab with pip:

pip install jupyterlab

Run JupyterLab using below command in command prompt

jupyter lab
Exercise :
Load iris dataset and calculate its accuracy using KNN algorithm in JupyterLab

Code snippet : (In jupyterlab)

#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

#Read the dataset

iris=pd.read_csv('iris.csv')
iris.head()# head() function is used to read first five training examples from a dataset

Output :

iris=pd.read_csv('iris.csv')
iris.tail()# tail() function is used to read last five training examples from a dataset

Output :
#To get particular instance information
iris['Species'].value_counts()#value_counts() counts the value of the instances in the
dataset
Output :

#To get all instances information

iris.columns
Output :

#To get all the values of an instances from an dataset

iris.values
Output :

#To get the information of an dataset

iris.info()
Output :
#To get the description about dataset in an statistical way
iris.describe()
Output :

#To get the description about all instances in dataset in an statistical way
iris.describe(include='all')

Output :

#Let us assume SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm as X and

Species as Y
X=iris.iloc[:,:5]#: indicates all rows and :5 indicates first 5 columns
X.head()#head() function is used to display first five records information
Output :
#iloc is index location Syntax is iloc[rows,columns] where : indicates all
#Assuming Species as Y
Y=iris.iloc[:,-1]# -1 indicates last column
Y.head()
Output :

# Train_test_split
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test =train_test_split(X,Y,test_size=0.3,random_state=1)
Y_test.shape

#Now make a model for Training and Predicting

knnmodel=KNeighborsClassifier(n_neighbors=3)
knnmodel.fit(X_train,Y_train)
Y_predict1=knnmodel.predict(X_test)#used to predict X_test

#Accuracy
from sklearn.metrics import accuracy_score
acc=accuracy_score(Y_test,Y_predict1)
acc
Output :
Experiment 3 : Implement k-nearest neighbors classification using python
Aim : To implement KNN algorithm using python
Software environment used: Python 3.12
Code snippet:
Procedure to work with KNN algorithm
 In this scikit-learn module is used
About scikit-learn module:
 Scikit-learn is an open-source Python library that implements a range of machine
learning, pre-processing, cross-validation, and visualization algorithms using a
unified interface.
 It is an open-source machine- learning library that provides a plethora of tools for
various machine-learning tasks such as Classification, Regression, Clustering, and
many more.
Installation of scikit-learn module
pip install scikit-learn
In this code, we are going to use iris dataset. And this dataset Split into training(70%) and
test set(30%).

The iris dataset contains the following features

---> sepal length (cm)

---> sepal width (cm)
---> petal length (cm)
---> petal width (cm)

The Sample data in iris dataset format is [5.4 3.4 1.7 0.2]

Where 5.4 ---> sepal length (cm)

3.4 ---> sepal width (cm)
1.7 ---> petal length (cm)
0.2 ---> petal width (cm)
Code snippet :

# Import necessary modules

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import random # used to make random numbers.

# Loading data
data_iris = load_iris()

# To get list of target names

label_target = data_iris.target_names
print()
print("Sample Data from Iris Dataset")
print("*"*30)

# to display the sample data from the iris dataset

for i in range(10):
rn = random.randint(0,120)
print(data_iris.data[rn],"=>",label_target[data_iris.target[rn]])

# Create feature and target arrays

X = data_iris.data
y = data_iris.target

# Split into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=1)
print("The Training dataset length: ",len(X_train))
print("The Testing dataset length: ",len(X_test))
try:
nn = int(input("Enter number of neighbors :"))
knn = KNeighborsClassifier(nn)
knn.fit(X_train, y_train)

# to display the score

print("The Score is :",knn.score(X_test, y_test))

# To get test data from the user

test_data = input("Enter Test Data :").split(",")
for i in range(len(test_data)):
test_data[i] = float(test_data[i])

print()
v = knn.predict([test_data])
print("Predicted output is :",label_target[v])
except:
print("Please supply valid input......")

# except is a keyword used in control flow statements to handle exceptions that may arise
during program execution. It can be used in a try-except block to catch specific types of
exceptions and schedule statements to run if an exception occurs.

Output :

Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Ip Class Xii Sample Question Paper 1
100% (3)
Ip Class Xii Sample Question Paper 1
11 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Programming With Python
No ratings yet
Programming With Python
177 pages
Pandas - Dataframe - Attributes
No ratings yet
Pandas - Dataframe - Attributes
14 pages
Machine Learning Pract
No ratings yet
Machine Learning Pract
7 pages
Stereonet Help
No ratings yet
Stereonet Help
38 pages
CIS 405 - Group Project
0% (1)
CIS 405 - Group Project
2 pages
FieldMove Clino Help
No ratings yet
FieldMove Clino Help
40 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
No ratings yet
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
141 pages
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
No ratings yet
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
5 pages
User Manual PSPP
No ratings yet
User Manual PSPP
193 pages
Comandos Datamine Studio
100% (1)
Comandos Datamine Studio
17 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
ML Project Assigment
No ratings yet
ML Project Assigment
32 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
36 pages
ML 3
No ratings yet
ML 3
24 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
ML Record
No ratings yet
ML Record
19 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
ML Yogesh
No ratings yet
ML Yogesh
23 pages
ML Keshav
No ratings yet
ML Keshav
23 pages
Mlpy 2
No ratings yet
Mlpy 2
18 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
AIML Record 56
No ratings yet
AIML Record 56
28 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
CS Practical File 2023-24 (Python and SQL)
No ratings yet
CS Practical File 2023-24 (Python and SQL)
52 pages
Vidya Lakshmi API Interface Document v1.6
No ratings yet
Vidya Lakshmi API Interface Document v1.6
16 pages
Record
No ratings yet
Record
23 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
ML Manual
No ratings yet
ML Manual
21 pages
Da Program
No ratings yet
Da Program
18 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Lab4 KNN
No ratings yet
Lab4 KNN
9 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
AAM PR QB
No ratings yet
AAM PR QB
13 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
39 pages
Project-Railway Reservation System
No ratings yet
Project-Railway Reservation System
11 pages
ML - Practical File
No ratings yet
ML - Practical File
15 pages
1
No ratings yet
1
13 pages
Sp800 53ar5 Assessment Procedures
No ratings yet
Sp800 53ar5 Assessment Procedures
17 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
ESD-Link34 User's Manual - 192321852031001
No ratings yet
ESD-Link34 User's Manual - 192321852031001
31 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
Class 12 Computer Science Viva-Voce Questions
No ratings yet
Class 12 Computer Science Viva-Voce Questions
11 pages
Open Table Format - Delta Lake
No ratings yet
Open Table Format - Delta Lake
10 pages
Vehicle Data Analyzation Research Paper
No ratings yet
Vehicle Data Analyzation Research Paper
12 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Computer Science Practical Questions
No ratings yet
Computer Science Practical Questions
6 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Practical File-Xii-2022-23
No ratings yet
Practical File-Xii-2022-23
39 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
Python For Machine Learning Basics
No ratings yet
Python For Machine Learning Basics
36 pages
Final ML File
No ratings yet
Final ML File
34 pages
Eda Unit 1
No ratings yet
Eda Unit 1
7 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
CCGeo CSU QGIS Module Exercise-2 082320181
No ratings yet
CCGeo CSU QGIS Module Exercise-2 082320181
26 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
Phase 3 Xii Ip (24!12!2024) Set C
No ratings yet
Phase 3 Xii Ip (24!12!2024) Set C
8 pages
Data Import::: Cheat Sheet
No ratings yet
Data Import::: Cheat Sheet
2 pages
Mnbnmnbnnmbbhhuyrgh
No ratings yet
Mnbnmnbnnmbbhhuyrgh
3 pages
Cambridge International AS & A Level: Information Technology 9626/02
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/02
4 pages
Metalink v74-158c Eng
No ratings yet
Metalink v74-158c Eng
32 pages
ML
No ratings yet
ML
8 pages
Teradata SQL Assistant Web Hands-On Lesson
No ratings yet
Teradata SQL Assistant Web Hands-On Lesson
23 pages
Aae 2019 90 PDF
No ratings yet
Aae 2019 90 PDF
10 pages
ML Lab
No ratings yet
ML Lab
7 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Profile Builder Manual
No ratings yet
Profile Builder Manual
21 pages
10.3 Files (MT-L)
No ratings yet
10.3 Files (MT-L)
11 pages
VMS Mag Stripe Encoding Specs
No ratings yet
VMS Mag Stripe Encoding Specs
1 page
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet

Exercise and Experiment 3

Uploaded by

Exercise and Experiment 3

Uploaded by

WORKING ON DATASET

a) To read CSV(Comma Seperated Values) files

Let us consider one CSV file

# to_string() to print the entire DataFrame.

Install JupyterLab with pip:

pip install jupyterlab

Run JupyterLab using below command in command prompt

Code snippet : (In jupyterlab)

#Read the dataset

#To get all instances information

#To get all the values of an instances from an dataset

#To get the information of an dataset

#Let us assume SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm as X and

#Now make a model for Training and Predicting

The iris dataset contains the following features

---> sepal length (cm)

Where 5.4 ---> sepal length (cm)

# Import necessary modules

# To get list of target names

# to display the sample data from the iris dataset

# Create feature and target arrays

# Split into training and test set

# to display the score

# To get test data from the user

You might also like