0% found this document useful (0 votes)
3 views

FDS LAB PROGRAMS

The document provides a comprehensive guide on installing Python and Jupyter Notebook on Windows, including step-by-step procedures for downloading, running installers, and verifying installations. It also includes tasks for programming in NumPy and Pandas, such as adding borders to arrays, finding unique elements, and creating DataFrames. Additionally, it covers data visualization using Plotly, including creating line charts, bar charts, and pie charts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

FDS LAB PROGRAMS

The document provides a comprehensive guide on installing Python and Jupyter Notebook on Windows, including step-by-step procedures for downloading, running installers, and verifying installations. It also includes tasks for programming in NumPy and Pandas, such as adding borders to arrays, finding unique elements, and creating DataFrames. Additionally, it covers data visualization using Plotly, including creating line charts, bar charts, and pie charts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Task 1 1.

(a) Python installation for WINDOWS


Aim : to implement Python installation for WINDOWS

Procedure:
The Python programming language is an increasingly popular choice for both beginners and experienced
developers. Flexible and versatile, Python has strengths in scripting, automation, data analysis, machine
learning, and back-end development.

You’ll need a computer running Windows 10 with administrative privileges and an internet
connection.

Step 1 — Downloading the Python Installer

1. Go to the official Python download page for Windows.


2. Find a stable Python 3 release. This tutorial was tested with Python version 3.10.10.
3. Click the appropriate link for your system to download the executable file: Windows installer
(64-bit) or Windows installer (32-bit).

Step 2 — Running the Executable Installer

1. After the installer is downloaded, double-click the .exe file, for example python-3.10.10-
amd64.exe, to run the Python installer.
2. Select the Install launcher for all users checkbox, which enables all users of the computer to
access the Python launcher application.
3. Select the Add python.exe to PATH checkbox, which enables users to launch Python from the
command line.

4. If you’re just getting started with Python and you want to install it with default features as
described in the dialog, then click Install Now and go to Step 4 - Verify the Python Installation.
To install other optional and advanced features, click Customize installation and continue.
5. The Optional Features include common tools and resources for Python and you can install all of
them, even if you don’t plan to use them.
Select some or all of the following options:

 Documentation: recommended
 pip: recommended if you want to install other Python packages, such as NumPy or pandas
 tcl/tk and IDLE: recommended if you plan to use IDLE or follow tutorials that use it
 Python test suite: recommended for testing and learning
 py launcher and for all users: recommended to enable users to launch Python from the command line
6. Click Next.
7. The Advanced Options dialog displays.
Select the options that suit your requirements:

 Install for all users: recommended if you’re not the only user on this computer
 Associate files with Python: recommended, because this option associates all the Python file types with
the launcher or editor
 Create shortcuts for installed applications: recommended to enable shortcuts for Python applications
 Add Python to environment variables: recommended to enable launching Python
 Precompile standard library: not required, it might down the installation
 Download debugging symbols and Download debug binaries: recommended only if you plan to create
C or C++ extensions

Make note of the Python installation directory in case you need to reference it later.
8. Click Install to start the installation.
9. After the installation is complete, a Setup was successful message displays.
Step 3 — Adding Python to the Environment Variables (optional)

Skip this step if you selected Add Python to environment variables during installation.

If you want to access Python through the command line but you didn’t add Python to your
environment variables during installation, then you can still do it manually.

Before you start, locate the Python installation directory on your system. The following directories
are examples of the default directory paths:

 C:\Program Files\Python310: if you selected Install for all users during installation, then the
directory will be system wide
 C:\Users\Sammy\AppData\Local\Programs\Python\Python310: if you didn’t select Install for all
users during installation, then the directory will be in the Windows user path

Note that the folder name will be different if you installed a different version, but will still start
with Python.

1. Go to Start and enter advanced system settings in the search bar.


2. Click View advanced system settings.
3. In the System Properties dialog, click the Advanced tab and then click Environment
Variables.
4. Depending on your installation:
 If you selected Install for all users during installation, select Path from the list of System Variables and
click Edit.
 If you didn’t select Install for all users during installation, select Path from the list of User Variables and
click Edit.
5. Click New and enter the Python directory path, then click OK until all the dialogs are closed.
Step 4 — Verify the Python Installation

You can verify whether the Python installation is successful either through the command line or
through the Integrated Development Environment (IDLE) application, if you chose to install it.

Go to Start and enter cmd in the search bar. Click Command Prompt.

Enter the following command in the command prompt:

python --version

An example of the output is:

Output

Python 3.10.10

You can also check the version of Python by opening the IDLE application. Go to Start and
enter python in the search bar and then click the IDLE app, for example IDLE (Python 3.10 64-
bit).

You can start coding in Python using IDLE or your preferred code editor.

Task 1 1.b) Installation of Jupyter Notebook

Aim: To install to jupyter Notebook in Windows

Description:

Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations, and narrative text. Uses
include data cleaning and transformation, numerical simulation, statistical modeling,
data visualization, machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one of
them. Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the
Jupyter Notebook itself.

Installing Jupyter Notebook using Anaconda:

Anaconda is an open-source software that contains Jupyter, spyder, etc that


are used for large data processing, data analytics, heavy scientific computing.
Anaconda works for R and python programming language. Spyder(sub-
application of Anaconda) is used for python. Opencv for python will work in
spyder. Package versions are managed by the package management system
called conda.
To install Jupyter using Anaconda, just go through the following instructions:
 Launch Anaconda Navigator:
 Click on the Install Jupyter Notebook Button:

 Beginning the Installation:


 Loading Packages:

 Finished Installation:
Launching Jupyter:

Task1 2.a) write a numpy program to add a border filled with 0’s around the existing array
AIM: to create a NumPy program to add a border (filled with 0's) around an existing array.

Description:

program:
Output:

Task-1 2.b) Write a NumPy program to get the unique elements of an array.

AIM: To Write a NumPy program to get the unique elements of an array.

Description:

Program:
Output:

Task-1 2.c) Write a NumPy program to get the values and indices of the elements
that are bigger than 10 in an given array.

AIM: TO Write a NumPy program to get the values and indices of the elements that are
bigger than 10 in an given array.

DESCRIPTION:
PROGRAM:
OUTPUT:

Task- Write a Pandas program to create and display a DataFrame from a


2 .3(a) specified dictionary data which has the index labels.

Aim: To Write a Pandas program to create and display a DataFrame from a specified
dictionary data which has the index labels.

Sample DataFrame:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
program:

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily',


'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no',
'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)

print(df)

Output :

attempts name qualify score


a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0

TASK- Write a Pandas program to select the rows where the score is missing,
2-3(b) i.e. is NaN.
Aim: To Write a Pandas program to select the rows where the score is missing, i.e. is
NaN.

Sample DataFrame:
Sample Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Program:
import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily',


'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],


'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no',
'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)

print("Rows where score is missing:")

print(df[df['score'].isnull

Output:

Rows where score is missing:


attempts name qualify score
d 3 James no NaN
h 1 Laura no NaN

Task-3 4.a) Write a Python program to draw a scatter plot with empty circles taking a random
distribution in X and Y and plotted against each other.

Aim: To Write a Python program to draw a scatter plot with empty circles taking a random distribution in X and
Y and plotted against each other.

Code:

Output:
Task-3 4.b) Write a Python program to draw a pie chart with a title of popularity of programming
languages.

Aim: To Write a Python program to draw a pie chart with a title of popularity of programming languages.

Sample data:
Programming languages: Java, Python, PHP, JavaScript, C#, C++
Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7

Program:

import matplotlib.pyplot as plt


# Data to plot
languages = 'Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++'
popuratity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
colors =[“green”,”blue”,”yellow”,” red“]
# explode 1st slice
explode = (0.1, 0, 0, 0,0,0)
# Plot
plt.pie(popuratity, explode=explode, labels=languages, colors=colors,
autopct='%1.1f%%', shadow=True, startangle=140)

plt.axis('equal')
plt.legend(title=”Popularity of Programming Languages”)
plt.show()

Output:
TASK-4 5.A) INSTALL PLOTLY
AIM : To install Plotly Succesfully in our PC

Procedure:

The Plotly Python library is an interactive open-source library. This can


be a very helpful tool for data visualization and understanding the
data simply and easily. plotly graph objects are a high-level interface
to plotly which are easy to use. It can plot various types of graphs
and charts like scatter plots, line charts, bar charts, box plots,
histograms, pie charts, etc.
So you all must be wondering why plotly over other visualization tools
or libraries? Here’s the answer –
 Plotly has hover tool capabilities that allow us to detect any
outliers or anomalies in a large number of data points.
 It is visually attractive that can be accepted by a wide range of
audiences.
 It allows us for the endless customization of our graphs that
makes our plot more meaningful and understandable for others.
Ok, enough theory let’s start.
INSTALLATION:

STEP 1: Click on Start  and Type conda install plotly


Step 2: after this type ‘y’ and press enter.

Now Plotly is Downloaded Successfully and Open the Jupyter Notebook and type Plotly Programs

TASK-4 5.B) Create Line Chart ,Bar Chart, Pie Chart Using Plotly

Aim: To Create Line Chart ,Bar Chart, Pie Chart Using Plotly
1. Line chart:

Program:
import plotly.express as px

x = [1,2,3,4,5]

y = [1,3,4,5,6]

fig = px.line( x = x ,y = y,title = 'A simple line graph')

fig.show()

OUTPUT:

A Simple Linear Graph


2. Bar charts: Bar charts are used when we want to compare different
groups of data and make inferences of which groups are highest and
which groups are common and compare how one group is performing
compared to others.
Program:

output:
3. Pie chart: A pie chart represents the distribution of different
variables among total. In the pie chart each slice shows its
contribution to the total amount.
Program:
# import all required libraries
import numpy as np

import plotly

import plotly.graph_objects as go
import plotly.offline as pyo

from plotly.offline import init_notebook_mode

init_notebook_mode(connected = True)

# different individual parts in


# total chart
countries=['India', 'canada',

'Australia','Brazil',
'Mexico','Russia',
'Germany','Switzerland',
'Texas']

# values corresponding to
each # individual country
present in # countries

values = [4500, 2500, 1053, 500,

3200, 1500, 1253, 600, 3500]

# plotting pie chart

fig = go.Figure(data=[go.Pie(labels=countries,
values=values)])

fig.show()
TASK-4 Create Box Plots, Violin Plots, Heatmaps Using Plotly
5.c)

Aim :- To Create Box Plots, Violin Plots, Heatmaps Using Plotly

1. Box Plots: A Box Plot is also known as Whisker plot is created to


display the summary of the set of data values having properties like
minimum, first quartile, median, third quartile and maximum. In the
box plot, a box is created from the first quartile to the third quartile, a
vertical line is also there which goes through the box at the median.
Here x-axis denotes the data to be plotted while the y-axis shows the
frequency distribution.
Program:
2. Violin plots

Violin Plot is a method to visualize the distribution of numerical data of


different variables. It is similar to Box Plot but with a rotated plot on
each side, giving more information about the density estimate on the
y-axis. The density is mirrored and flipped over and the resulting
shape is filled in, creating an image resembling a violin. The advantage
of a violin plot is that it can show nuances in the distribution that aren’t
perceptible in a boxplot. On the other hand, the boxplot more clearly
shows the outliers in the data.
Program:

3. Heatmaps

Heatmap is defined as a graphical representation of data using colors


to visualize the value of the matrix. In this, to represent more
common values or higher activities brighter colors basically reddish
colors are used and to represent less common or activity values,
darker colors are preferred. Heatmap is also defined by the name of
the shading matrix.
Program:

TASK-5- 6. A) Develop the Model Simple Linear Regression with Python

Aim: To develop the model Simple

Linear Regression with Python

Description:

Simple Linear Regression is a type of Regression algorithms that


models the relationship between a dependent variable and a single
independent variable. The relationship shown by a Simple Linear
Regression model is linear or a sloped straight line, hence it is called
Simple Linear Regression.
The key point in Simple Linear Regression is that the dependent
variable must be a continuous/real value. However, the
independent variable can be measured on continuous or categorical
values.

Program:

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv('Salary_Data.csv')
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_s
tate=0)
#Fitting the Simple Linear Regression model to the training dataset
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
#Prediction of Test and Training set result
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
mtp.scatter(x_train, y_train, color="green")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Training Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
#visualizing the Test set results
mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
Task5 6.b) Develop the Multiple Linear Regression with python
Aim: To Develop the Multiple Linear Regression with python

Description:

Multiple linear regression is used to estimate the relationship between


two or more independent variables and one dependent variable. You can
use multiple linear regression when you want to know:

Program:

# importing libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# importing the dataset

dataset=pd.read_csv("C:/Users/NECG/Desktop/50_StartsUp.c

sv") x=dataset.iloc[:,:-1].values

y=dataset.iloc[:,-1]

from sklearn.compose import

ColumnTransformer from

sklearn.preprocessing import

OneHotEncoder

ct=ColumnTransformer(transformers=[('encoder',OneHotEncoder(),

[3])],remainder='passthrough') x=np.array(ct.fit_transform(x))

#Avoding Dummy variable Trap

X=x[:,1:]

from sklearn.model_selection import train_test_split


X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random
_State=0)

# training the multiple Linear Regression model on the training set


from sklearn.linear_model import

LinearRegression

regressor=LinearRegression()

regressor.fit(X_train,y_train)

#Predicting the test set

results

y_pred=regressor.predi

ct(X_test) y_test

y_pred

#mean square error

from sklearn.metrics import

mean_squared_error

mean_squared_error(y_test,y_pr

ed)

#r2 Score to find Accuracy

from sklearn.metrics

import r2_score

r2_score(y_test,y_pred)

#visualizing the train set results

mtp.scatter(x_train,y_train,color="green")

mtp.plot(x_train,x_pred,color="red") mtp.title("Salary vs

Experience(Training Dataset)") mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()

#visualizing the test set result

mtp.scatter(x_train,y_train,color="blue")
mtp.plot(x_train,x_pred,color="red")

mtp.title("Salary vs Experience(Training Dataset)")

mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()

Output:

TASK-6
7. Write a program to implement Logistic Regression.

Aim: to implement Logistic Regression

To understand the implementation of Logistic Regression in Python, we will


use the below example:

Example: There is a dataset given which contains the information of various


users obtained from the social networking sites. There is a car making
company that has recently launched a new SUV car. So the company wanted
to check how many users from the dataset, wants to purchase the car.

For this problem, we will build a Machine Learning model using the Logistic
regression algorithm. The dataset is shown in the below image. In this
problem, we will predict the purchased variable (Dependent
Variable) by using age and salary (Independent variables).

Program:

#Data Pre-procesing Step


# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
#importing datasets
data_set= pd.read_csv('user_data.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values

# Splitting the dataset into training and test set.


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
#Fitting Logistic Regression to the training set
from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_s
caling=1, l1_ratio=None, max_iter=100, multi_class='warn', n_jobs=None, pen
alty='l2', random_state=0, solver='warn', tol=0.0001, verbose=0, warm_start
=False)
#Predicting the test set result
y_pred= classifier.predict(x_test)

#Creating the Confusion matrix


from sklearn.metrics import confusion_matrix
cm= confusion_matrix()
#Visulaizing the test set result
from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:,
0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0
.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).res
hape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Logistic Regression (Test set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
TASK-6 8. Write a program to implement the Decision Tree
Regression model.
Aim: To implement the Decision Tree Regression model.
Program:
# import numpy package for arrays and stuff
import numpy as np

# import matplotlib.pyplot for plotting our result


import matplotlib.pyplot as plt

# import pandas for importing csv files


import pandas as pd
# import dataset
# dataset = pd.read_csv('Data.csv')
# alternatively open up .csv file to read data

dataset = np.array(
[['Asset Flip', 100, 1000],
['Text Based', 500, 3000],
['Visual Novel', 1500, 5000],
['2D Pixel Art', 3500, 8000],
['2D Vector Art', 5000, 6500],
['Strategy', 6000, 7000],
['First Person Shooter', 8000, 15000],
['Simulator', 9500, 20000],
['Racing', 12000, 21000],
['RPG', 14000, 25000],
['Sandbox', 15500, 27000],
['Open-World', 16500, 30000],
['MMOFPS', 25000, 52000],
['MMORPG', 30000, 80000]
])

# print the dataset


print(dataset)
# select all rows by : and column 1
# by 1:2 representing features
X = dataset[:, 1:2].astype(int)

# print X
print(X)

# select all rows by : and column 2


# by 2 to Y representing labels
y = dataset[:, 2].astype(int)

# print y
print(y)
# import the regressor
from sklearn.tree import DecisionTreeRegressor

# create a regressor object


regressor = DecisionTreeRegressor(random_state = 0)

# fit the regressor with X and Y data


regressor.fit(X, y)

# predicting a new value

# test the output by changing values, like 3750


y_pred = regressor.predict([[3750]])

# print the predicted price


print("Predicted price: % d\n"% y_pred)

# arange for creating a range of values


# from min value of X to max value of X
# with a difference of 0.01 between two
# consecutive values
X_grid = np.arange(min(X), max(X), 0.01)

# reshape for reshaping the data into


# a len(X_grid)*1 array, i.e. to make
# a column out of the X_grid values
X_grid = X_grid.reshape((len(X_grid), 1))
# scatter plot for original data
plt.scatter(X, y, color = 'red')

# plot predicted data


plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')

# specify title
plt.title('Profit to Production Cost (Decision Tree Regression)')

# specify X axis label


plt.xlabel('Production Cost')

# specify Y axis label


plt.ylabel('Profit')

# show the plot


plt.show()

# import export_graphviz
from sklearn.tree import export_graphviz

# export the decision tree to a tree.dot file


# for visualizing the plot easily anywhere
export_graphviz(regressor, out_file ='tree.dot',
feature_names =['Production Cost'])
TASK-7 9. Write a program to implement Random Forest Classification model

AIM: To implement Python Program on Random Forest Classification model


PROGRAM:

#Import scikit-learn dataset library


from sklearn import datasets
#Load dataset
iris = datasets.load_iris()
# print the label species(setosa, versicolor,virginica)
print(iris.target_names)
# print the names of the four features
print(iris.feature_names)
# print the iris data (top 5 records)
print(iris.data[0:5])
# print the iris labels (0:setosa, 1:versicolor, 2:virginica)
print(iris.target)
# Creating a DataFrame of given iris dataset.
import pandas as pd
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepal width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
data.head()
# Import train_test_split function
from sklearn.model_selection import train_test_split
X=data[['sepal length', 'sepal width', 'petal length', 'petal width']]
# Features
y=data['species']
# Labels
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) #
70% training and 30% test
#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier
#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)
#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
clf.predict([[3, 5, 4, 2]])

from sklearn.ensemble import RandomForestClassifier


#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)
#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)
import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,index=iris.feature_names)
.sort_values(ascending=False)
feature_imp
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Creating a bar plot
sns.barplot(x=feature_imp, y=feature_imp.index)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important Features")
plt.legend()
plt.show()
output:

TASK-8 10. Write a program to implement the K-Nearest Neighbour(KNN) algorithm to classify the
given dataset

Aim: To Write a program to implement the K-Nearest Neighbour(KNN) algorithm to classify the given
dataset

Program:

# Import necessary modules


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Loading data
irisData = load_iris()

# Create feature and target arrays


X = irisData.data
y = irisData.target

# Split into training and test set


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=7)

knn.fit(X_train, y_train)

# Predict on dataset which model has not seen before


print(knn.predict(X_test))
print(knn.score(X_test, y_test))
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop over K values


for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)

# Compute training and test data accuracy


train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')

plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

OUTPUT:

Task 9 11. Write a program to implement the Naive Bayesian classifier for a simple training data set to be
stored in .CSV file.
Aim: To implement the Naive Bayesian classifier for a simple training data set to be stored in .CSV file.

Program:

#Importing the libraries


import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('user_data.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_stat
e = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
mtp.xlim(X1.min(), X1.max())
mtp.ylim(X2.min(), X2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Naive Bayes (Training set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()

# Visualising the Test set results


from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
mtp.xlim(X1.min(), X1.max())
mtp.ylim(X2.min(), X2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Naive Bayes (test set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
Output:
12. Write a program to implement clustering algorithm to cluster the set of
Task 10 data stored in .CSV file.

Aim: Program to implement clustering algorithm to cluster the set of data stored
in .CSV file.

Program:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Mall_Customers_data.csv')

x = dataset.iloc[:, [3, 4]].values


from sklearn.cluster import KMeans
wcss_list= [] #Initializing the list for the values of WCSS

#Using for loop for iterations from 1 to 10.


for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
1. #training the K-means model on a dataset
2. kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
3. y_predict= kmeans.fit_predict(x)

#training the K-means model on a dataset


kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
y_predict= kmeans.fit_predict(x)
#visulaizing the clusters
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', la
bel = 'Cluster 1') #for first cluster mtp.scatter(x[y_predict == 1, 0], x[y_pre
dict == 1, 1], s = 100, c = 'green', label = 'Cluster 2') #for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', lab
el = 'Cluster 3') #for third cluster
mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', la
bel = 'Cluster 4') #for fourth cluster
mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magent
a', label = 'Cluster 5') #for fifth cluster
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s =
300, c = 'yellow', label = 'Centroid')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()

You might also like