0% found this document useful (0 votes)

176 views50 pages

Dsf-Pyt-Lab Manual

Uploaded by

thilakraj.a0321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views50 pages

Dsf-Pyt-Lab Manual

Uploaded by

thilakraj.a0321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Ex no: 1 WORKING WITH NUMPY ARRAYS

AIM
Working with Numpy arrays

ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of array Step4: Stop

PROGRAM

import numpy as np

# Creating array object arr = np.array( [[ 1, 2, 3], [ 4,

2, 5]] )

# Printing type of arr object print("Array is of type: ", type(arr)) #

Printing array dimensions (axes)

print("No. of dimensions: ", arr.ndim)

# Printing shape of array print("Shape of array: ", arr.shape)

# Printing size (total number of elements) of array print("Size of array: ", arr.size) #

Printing type of elements in array

print("Array stores elements of type: ", arr.dtype)

OUTPUT

Array is of type: <class 'numpy.ndarray'> No. of dimensions: 2

Shape of array: (2, 3) Size of array: 6
Array stores elements of type: int32
Program to Perform Array Slicing

a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing") print(a[1:])

Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing [[3 4 5]
[4 5 6]]

Program to Perform Array Slicing

# array to begin with import numpy as np a
= np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' ) print(a)
# this returns array of items in the second column print('The items in the second column
are:' ) print(a[...,1])
print('\n' )
# Now we will slice all items from the second row print ('The items in the second row are:'
) print(a[1,...])
print('\n' )
# Now we will slice all items from column 1 onwards print('The items column 1 onwards
are:' ) print(a[...,1:])

Output:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are: [2
4 5]
The items in the second row are: [3
4 5]
The items column 1 onwards are: [[2
3]
[4 5]
[5 6]]
Result:
Thus the working with Numpy arrays was successfully completed

Ex no: 2 Create a dataframe using a list of elements.

Aim:
To work with Pandas data frames

ALGORITHM
Step1: Start
Step2: import numpy and pandas module Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM
import numpy as np import pandas as pd data =
np.array([['','Col1','Col2'], ['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:], index
= data[1:,0], columns=data[0,1:]))
# Take a 2D array as input to your DataFrame my_2darray = np.array([[1, 2, 3], [4, 5,
6]]) print(pd.DataFrame(my_2darray))
# Take a dictionary as input to your DataFrame my_dict = {1: ['1', '3'], 2: ['1', '2'], 3:
['2', '4']}
print(pd.DataFrame(my_dict))
# Take a DataFrame as input to your DataFrame
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(pd.DataFrame(my_df))
# Take a Series as input to your DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]])) #
Use the `shape` property print(df.shape)
# Or use the `len()` function with the `index` property print(len(df.index))

Output:
Col1 Col2
Row1, , , 1, 2
Row2, , , 3, 4
0, 1, 2, ,
0, 1, 2, 3,
1, 4, 5, 61, 2 3
0, 1, 1, 2,
1, 3, 2, 4A,
0, 4, , ,
1, 5, , ,
2, 6, , ,
3, 7, , ,
0, , , ,
United Kingdom London India
New Delhi United States Washington Belgium
Brussels
(2, 3)
2

Result:
Thus the working with Pandas data frames was successfully completed.

EX. NO.:3 BASIC PLOTS USING MATPLOTLIB

AIM:

To draw basic plots in Python program using Matplotlib

ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

Program:3a
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values y
= [2,4,1]
# plotting the points
plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis') #
naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
Output:
Program:3b
import matplotlib.pyplot as plt a
= [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is #
for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command ax
= plt.gca()
# get command over the individual #
boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) #
set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)

# set the interval by which #

the x-axis set the marks
plt.xticks(list(range(-3, 10)))

# set the intervals by which y-axis

# set the marks
plt.yticks(list(range(-3, 20, 3)))

# legend denotes that what color #

signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])

# annotate command helps to write

# ON THE GRAPH any text xy denotes #
the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph

plt.title('All Features Discussed')
plt.show()

Output:

Program:3c
import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]

# use fig whenever u want the #

output in a new window also #
specify the window size you #
want ans to be displayed
fig = plt.figure(figsize =(10, 10))

# creating multiple plots in a #

single plot
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
sub4 = plt.subplot(2, 2, 4)

sub1.plot(a, 'sb')

# sets how the display subplot #

x axis values advances by 1 #
within the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')

sub2.plot(b, 'or')

# sets how the display subplot x axis #

values advances by 2 within the
# specified range
sub2.set_xticks(list(range(0, 10, 2)))
sub2.set_title('2nd Rep')

# can directly pass a list in the plot

# function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')

sub4.plot(c, 'Dm')

# similarly we can set the ticks for #

the y-axis range(start(inclusive), #
end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')
# without writing plt.show() no plot
# will be visible
plt.show()
Output:

Result:

Thus the basic plots using Matplotlib in Python program was successfully
completed.

Ex. No.:4(a) FREQUENCY DISTRIBUTIONS

AIM:
To Count the frequency of occurrence of a word in a body of text is often needed during
text processing.

ALGORITHM
Step 1: Start the Program
Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist]
print("Pairs\n" + str(zip(token, wordfreq)))
Output:
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS',
2), (OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1),
(BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1),
(Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant',
1), (glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he',
1), (laughing', 1),
(said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed
during
text processing and Conditional Frequency Distribution program using python was
successfully completed.

Ex. No.: 4 (c) VARIABILITY

Aim:
To write a python program to calculate the variance.

ALGORITHM

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr Step 4:
Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process

Program:
# Python code to demonstrate variance() #
function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers #

data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the variance of each samples

print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Output :

Variance of Sample 1 is 15.80952380952381

Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

Result:
Thus the computation for variance was successfully completed.
Ex. No.:4(d) NORMAL CURVE

Aim:
To create a normal curve using python program.
ALGORITHM
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats Step
3: Import packages numpy, matplotlib and seaborn Step 4:
Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

Program:
# import required libraries
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution

data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:

Result:
Thus the normal curve using python program was successfully
completed.

Ex. No.: 4 (e) CORRELATION AND SCATTER PLOTS

Aim:
To write a python program for correlation with scatter plot

ALGORITHM
Step 1: Start the Program Step
2: Create variable y1, y2
Step 3: Create variable x, y3 using random function Step
4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

Program:
# Scatterplot and Correlations #
Data
x-pp random randn(100)
yl=x*5+9
y2=-5°x
y3=no_random.randn(100) #Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}) plt
scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)}) plt
scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)}) # Plot
plt titlef('Scatterplot and Correlations')
plt(legend)
plt(show)
Output :

RESULT:
Thus the Correlation and scatter plots using python program was successfully
completed.

Ex. No.: 4(f) CORRELATION COEFFICIENT

Aim:
To write a python program to compute correlation coefficient.

ALGORITHM
Step 1: Start the Program Step
2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula Step
5:Print the result
Step 6 : Stop the process

Program:
# Python Program to find correlation coefficient.
import math
# function that returns correlation coefficient.
def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y = sum_Y + Y[i]
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]

i=i+1
# use formula for calculating correlation #
coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array. n

= len(X)

# Function call to correlationCoefficient.

print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Output :
0.953463
Result:
Thus the computation for correlation coefficient was successfully completed.
Ex. No.: 4 (g) SIMPLE LINEAR REGRESSION

Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x Step
5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function Step
7: Print the result
Step 8: Stop the process

Program:
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points n
= np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients b
= estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)
if name == " main ":
main()

Output :

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

Graph:

Result:
Thus the computation for Simple Linear Regression was successfully completed.
EX.NO 5. USE THE STANDARD BENCHMARK DATASET
FOR PERFORMING THE FOLLOWING:
A) UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN, MODE,

VARIANCE, STANDARD DEVIATION, SKEWNESS AND KURTOSIS.

AIM:To explore various commands for doing Univariate analytics on the UCI AND PIMA

INDIANS DIABETES data set.

ALGORITHM:

STEP 1: Start the program

STEP 2: To download the UCI AND PIMA INDIANS DIABETES data set using Kaggle.

STEP 3: To read data from UCI AND PIMA INDIANS DIABETES data set. STEP 4:

To find the mean, median, mode, variance, standard deviation, skewness and kurtosis

in the given excel data set package.

STEP 5: Display the output.

STEP 6: Stop the program.

PROGRAM:

import pandas as pd import numpy as np

import matplotlib.pyplot as plt import seaborn as sns sns.set_style('darkgrid')

%matplotlib inline

from matplotlib.ticker import FormatStrFormatter import warnings

warnings.filterwarnings('ignore')
df = pd.read_csv('C:/Users/kirub/Documents/Learning/Untitled Folder/diabetes.csv')

df.head()

df.shape df.dtypes

df['Outcome']=df['Outcome'].astype('bool') df.dtypes['Outcome'] df.info()

df.describe().T

# Frequency# finding the unique count df1 = df['Outcome'].value_counts()

# displaying df1 print(df1) #mean df.mean() #median df.median()

#mode df.mode() #Variance df.var()

#standard deviation df.std()

#kurtosis df.kurtosis(axis=0,skipna=True)

df['Outcome'].kurtosis(axis=0,skipna=True) #skewness

# skewness along the index axis df.skew(axis = 0, skipna = True)

# skip the na values

# find skewness in each row df.skew(axis = 1, skipna = True)

#Pregnancy variable

preg_proportion = np.array(df['Pregnancies'].value_counts()) preg_month =

np.array(df['Pregnancies'].value_counts().index) preg_proportion_perc =

np.array(np.round(preg_proportion/sum(preg_proportion),3)*100,dtype=int)

preg =
pd.DataFrame({'month':preg_month,'count_of_preg_prop':preg_proportion,'perce

ntage_pro portion':preg_proportion_perc}) preg.set_index(['month'],inplace=True)

preg.head(10)

sns.countplot(data=df['Outcome']) sns.distplot(df['Pregnancies'])

sns.boxplot(data=df['Pregnancies'])

OUTPUT:

RESULT: Exploring various commands for doing univariate analytics on the UCI AND
PIMA INDIANS DIABETES was successfully executed.
EX.NO:5. B) BIVARIATE ANALYSIS: LINEAR AND LOGISTIC
REGRESSION DATE:MODELING

AIM:
To explore the Linear and Logistic Regression model on the USA HOUSING
AND UCI AND PIMA INDIANS DIABETES data set.

ALGORITHM:
STEP 1: Start the program
STEP 2: To download the any kind of data set like housing dataset using kaggle.
STEP 3: To read data from downloaded data set.
STEP 4: To find the linear and logistic regression model using the given data set.
STEP 5: Display the output.
STEP 6: Stop the program.

PROGRAM:
BIVARIATE ANALYSIS GENERAL PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt import
seaborn as sns sns.set_style('darkgrid')
%matplotlib inline
from matplotlib.ticker import FormatStrFormatter import
warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('C:/Users/diabetes.csv')
df.head()
df.shape
df.dtypes
df['Outcome']=df['Outcome'].astype('bool')

fig,axes = plt.subplots(nrows=3,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.countplot('Pregnancies',data=df,ax=axes[0][0],color='gree n')
axes[0][0].set_title('Count',fontdict={'fontsize':8}) axes[0]
[0].set_xlabel('Month of Preg.',fontdict={'fontsize':7}) axes[0]
[0].set_ylabel('Count',fontdict={'fontsize':7})
plt.tight_layout()

plot01=sns.countplot('Pregnancies',data=df,hue='Outcome',ax=axes[0][1]) axes[0]
[1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[0]
[1].set_xlabel('Month of Preg.',fontdict={'fontsize':7}) axes[0]
[1].set_ylabel('Count',fontdict={'fontsize':7}) plot01.axes.legend(loc=1)
plt.setp(axes[0][1].get_legend().get_texts(), fontsize='6') plt.setp(axes[0]
[1].get_legend().get_title(), fontsize='6') plt.tight_layout()

plot10 = sns.distplot(df['Pregnancies'],ax=axes[1][0]) axes[1][0].set_title('Pregnancies

Distribution',fontdict={'fontsize':8}) axes[1][0].set_xlabel('Pregnancy
Class',fontdict={'fontsize':7}) axes[1][0].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plt.tight_layout()

plot11 = df[df['Outcome']==False]['Pregnancies'].plot.hist(ax=axes[1][1],label='Non- Diab.')

plot11_2=df[df['Outcome']==True]['Pregnancies'].plot.hist(ax=axes[1][1],label='Diab.')
axes[1][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[1]
[1].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][1].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plot11.axes.legend(loc=1)
plt.setp(axes[1][1].get_legend().get_texts(), fontsize='6') # for legend text
plt.setp(axes[1][1].get_legend().get_title(), fontsize='6') # for legend title
plt.tight_layout()

plot20 = sns.boxplot(df['Pregnancies'],ax=axes[2][0],orient='v') axes[2]

[0].set_title('Pregnancies',fontdict={'fontsize':8}) axes[2]
[0].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[0].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.tight_layout()

plot21 = sns.boxplot(x='Outcome',y='Pregnancies',data=df,ax=axes[2][1])
axes[2][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8})

axes[2][1].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[1].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
plt.tight_layout()
plt.show()

OUTPUT:

## Blood Pressure variable

fig,axes = plt.subplots(nrows=2,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.distplot(df['BloodPressure'],ax=axes[0][0],color='green') axes[0]
[0].yaxis.set_major_formatter(FormatStrFormatter('%.3f')) axes[0]
[0].set_title('Distribution of BP',fontdict={'fontsize':8}) axes[0]
[0].set_xlabel('BP Class',fontdict={'fontsize':7}) axes[0]
[0].set_ylabel('Count/Dist.',fontdict={'fontsize':7}) plt.tight_layout()

plot01=sns.distplot(df[df['Outcome']==False]['BloodPressure'],ax=axes[0][1],color='green', label='Non
Diab.') sns.distplot(df[df.Outcome==True]['BloodPressure'],ax=axes[0][1],color='red',label='Diab')

axes[0][1].set_title('Distribution of BP',fontdict={'fontsize':8}) axes[0]

[1].set_xlabel('BP Class',fontdict={'fontsize':7}) axes[0]
[1].set_ylabel('Count/Dist.',fontdict={'fontsize':7}) axes[0]
[1].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.legend(loc=1) plt.setp(axes[0][1].get_legend().get_texts(),
fontsize='6') plt.setp(axes[0][1].get_legend().get_title(), fontsize='6')
plt.tight_layout() plot10=sns.boxplot(df['BloodPressure'],ax=axes[1]
[0],orient='v') axes[1][0].set_title('Numerical
Summary',fontdict={'fontsize':8}) axes[1]
[0].set_xlabel('BP',fontdict={'fontsize':7})
axes[1][0].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()
plot11=sns.boxplot(x='Outcome',y='BloodPressure',data=df,ax=axes[1][1])
axes[1][1].set_title(r'Numerical Summary (Outcome)',fontdict={'fontsize':8})
axes[1][1].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7) axes[1]
[1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()
plt.show()

OUTPUT:

fig,axes = plt.subplots(nrows=1,ncols=2,dpi=120,figsize = (8,4))

plot0=sns.distplot(df[df['BloodPressure']!=0]['BloodPressure'],ax=axes[0],color='green')
axes[0].yaxis.set_major_formatter(FormatStrFormatter('%.3f')) axes[0].set_title('Distribution of
BP',fontdict={'fontsize':8})
axes[0].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
plt.tight_layout()

plot1=sns.boxplot(df[df['BloodPressure']!=0]['BloodPressure'],ax=axes[1],orient='v')
axes[1].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1].set_xlabel('BloodPressure',fontdict={'fontsize':7})
axes[1].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()

OUTPUT:
LINEAR REGRESSION MODELLING ON HOUSING DATASET
# Data manipulation libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

USAhousing = pd.read_csv('USA_Housing.csv')
USAhousing.head()
USAhousing.info()
USAhousing.describe()

USAhousing.columns
sns.pairplot(USAhousing)

sns.distplot(USAhousing['Price'])

sns.heatmap(USAhousing.corr())

X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']] y =
USAhousing['Price']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101) from
sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)
# print the intercept
print(lm.intercept_)

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df
predictions = lm.predict(X_test)
plt.scatter(y_test,predictions)
sns.distplot((y_test-predictions),bins=50);
from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions)) print('MSE:',
metrics.mean_squared_error(y_test, predictions)) print('RMSE:',
np.sqrt(metrics.mean_squared_error(y_test, predictions)))

OUTPUT:

RESULT:
Exploring various commands for doing Bivariate analytics on the USA HOUSING Dataset was
successfully executed.
Ex. No.: 7. APPLY SUPERVISED LEARNING ALGORITHMS
AND UNSUPERVISED LEARNING ALGORITHMS WITH ANY DATSET

AIM: To apply supervised learning algorithms with any dataset.

ALGORITHM :

Decision Tree Construction: The algorithm recursively splits the dataset into subsets
based on the features that best separate the classes. It selects the feature that
provides the best split, usually based on criteria like Gini impurity or information
gain.

Decision Tree Pruning (Optional): After constructing the tree, pruning can be
applied to avoid overfitting. Pruning involves removing parts of the tree that do not
provide significant predictive power.

Prediction: To make predictions, the algorithm traverses the tree from the root node
to a leaf node, following the decision rules at each node based on the feature values.
The predicted class is the majority class of the instances in the leaf node.

PROGRAM :

# Import necessary libraries

from sklearn.datasets import load_iris
from sklearn.model_selection import
train_test_split from sklearn.tree import
DecisionTreeClassifier from sklearn.metrics import
accuracy_score

# Load the Iris dataset

iris = load_iris()
X = iris.data
y=
iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the Decision Tree classifier

clf = DecisionTreeClassifier()

# Train the classifier on the training data

clf.fit(X_train, y_train)

# Make predictions on the testing data

y_pred = clf.predict(X_test)

# Evaluate the accuracy of the classifier

accuracy = accuracy_score(y_test,
y_pred)
print("Accuracy:", accuracy)

OUTPUT :

Accuracy: 1.0

RESULT :

The result of running the provided program is the accuracy of the trained Decision
Tree classifier on the test data from the Iris dataset is classified.
EX.NO:8. APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS
ON UCI DATE: DATA SETS.

AIM:
To apply and explore various plotting functions on UCI datasets.

ALGORITHM:

STEP 1: Install seaborn package and import the package.

STEP 2: Normal curves, density or contour plots, correlation and
sctter plots, and histogram plots are visualized.
STEP 3: 3d plotting done using plotly package
STEP 4: Stop the program.
PROGRAM:

A. NORMAL CURVES

#seaborn package
import seaborn as
sns
flights = sns.load_dataset("flights")
flights.head()
may_flights = flights.query("month == 'May'")
sns.lineplot(data=may_flights, x="year",
y="passengers")
OUTPUT:
B. DENSITY AND CONTOUR PLOTS

iris = sns.load_dataset("iris")
sns.kdeplot(data=iris)

OUTPUT:

C. CORRELATION AND SCATTER PLOTS

#correlation visualized using

heatmap function df =
sns.load_dataset("titanic")
ax = sns.heatmap(df annot=True, fmt="d")

#scatter plots of categorical

variable df =
sns.load_dataset("titanic")
sns.catplot(data=df, x="age", y="class")
OUTPUT:
D. HISTOGRAMS

#histogram of datafra,e
df = sns.load_dataset("titanic")
sns.histplot(data=df, x="age")

OUTPUT:

E. THREE DIMENSIONAL PLOTTING

#3d plotting using

ploty package
import plotly as px
df = sns.load_dataset("iris")

px.scatter_3d(df, x="PetalLengthCm", y="PetalWidthCm",

z="SepalWidthCm", size="SepalLengthCm",
color="Species", color_discrete_map = {"Joly": "blue",
"Bergeron": "violet", "Coderre":"pink"})
OUTPUT:

Ad3351 Daa Lecture Notes Units 1,2,3
No ratings yet
Ad3351 Daa Lecture Notes Units 1,2,3
79 pages
Bachata Musicality
No ratings yet
Bachata Musicality
5 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Unit 5 Fod (1) (Repaired)
No ratings yet
Unit 5 Fod (1) (Repaired)
28 pages
Unit 5
No ratings yet
Unit 5
27 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Ge8151 Phython Prog Unit 4 New
No ratings yet
Ge8151 Phython Prog Unit 4 New
33 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Cs3353 Foundations of Data Science Unit V
No ratings yet
Cs3353 Foundations of Data Science Unit V
13 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
AD3461 ML Lab Manual
No ratings yet
AD3461 ML Lab Manual
32 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
CCS341 DW QP 28.04.25
No ratings yet
CCS341 DW QP 28.04.25
4 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
24 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
0% (1)
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
5 pages
Data and Information Security - CW3551 - Important Questions and Question Bank
No ratings yet
Data and Information Security - CW3551 - Important Questions and Question Bank
9 pages
Python Record
No ratings yet
Python Record
35 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
Ccs336 CSM Lab Manual
No ratings yet
Ccs336 CSM Lab Manual
30 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
CS3401-ALGORITHMS QB Original
No ratings yet
CS3401-ALGORITHMS QB Original
51 pages
Unit V Data Visualization
No ratings yet
Unit V Data Visualization
49 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
II Cse Cs3352 Fds QB Unit2
No ratings yet
II Cse Cs3352 Fds QB Unit2
5 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
21CS42-Module 2 DAA Notes
100% (1)
21CS42-Module 2 DAA Notes
25 pages
Cns Lessonplan
No ratings yet
Cns Lessonplan
2 pages
Visualization Errors
No ratings yet
Visualization Errors
34 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
36 pages
Ad3351 - Design and Analysis of Algorithm
No ratings yet
Ad3351 - Design and Analysis of Algorithm
41 pages
Artificial Intelligence Lab Manual: Python
No ratings yet
Artificial Intelligence Lab Manual: Python
15 pages
Relational Database Design: Exercises
No ratings yet
Relational Database Design: Exercises
9 pages
R Language
No ratings yet
R Language
59 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
DS Unit 3 Part 1
No ratings yet
DS Unit 3 Part 1
27 pages
Aiml Lab Manual 2023
No ratings yet
Aiml Lab Manual 2023
17 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
54 pages
Canon Ir2016 Ir2020 Brochure
No ratings yet
Canon Ir2016 Ir2020 Brochure
4 pages
Unit I Iot
No ratings yet
Unit I Iot
4 pages
Arduino Based Fire Fighting Robot Project
No ratings yet
Arduino Based Fire Fighting Robot Project
13 pages
Database Management Systems 1
No ratings yet
Database Management Systems 1
7 pages
Introduction To Openenterprise: Product Overview
No ratings yet
Introduction To Openenterprise: Product Overview
4 pages
BTech - 5sem - CE - Booklet - 2022-23-ODD
No ratings yet
BTech - 5sem - CE - Booklet - 2022-23-ODD
37 pages
All New X-Men 007 Read All Comics Online 3
No ratings yet
All New X-Men 007 Read All Comics Online 3
1 page
431-342-02 Using Mitutoyo DP-1 VR
No ratings yet
431-342-02 Using Mitutoyo DP-1 VR
2 pages
Project Management Process Groups and Knowledge Areas Mapping
100% (1)
Project Management Process Groups and Knowledge Areas Mapping
1 page
c0594434 Section 5 Part 2
No ratings yet
c0594434 Section 5 Part 2
4 pages
161 - Course Details B.E.electrical Engineering
No ratings yet
161 - Course Details B.E.electrical Engineering
36 pages
How Has Technology Affected Your Life
No ratings yet
How Has Technology Affected Your Life
10 pages
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
No ratings yet
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
9 pages
Sadhana
No ratings yet
Sadhana
2 pages
Linear Programming
No ratings yet
Linear Programming
36 pages
What Is Interactive Media1
No ratings yet
What Is Interactive Media1
4 pages
Touchstone Teacher's Edition 3 Unit 9
100% (2)
Touchstone Teacher's Edition 3 Unit 9
13 pages
Micro Python
No ratings yet
Micro Python
10 pages
Installation & Maintenance Instructions: Motorised Ball Valve
No ratings yet
Installation & Maintenance Instructions: Motorised Ball Valve
9 pages
Multinomial Goodness-of-Fit Based On U - Statistics: High-Dimensional Asymptotic and Minimax Optimality
No ratings yet
Multinomial Goodness-of-Fit Based On U - Statistics: High-Dimensional Asymptotic and Minimax Optimality
29 pages
Construction Schedule: Ks Saastha Enterprise
No ratings yet
Construction Schedule: Ks Saastha Enterprise
4 pages
The Entity-Relationship Model: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
The Entity-Relationship Model: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
18 pages
OOP-Week3 - Class 2UML-CLass Diagram-Pages
No ratings yet
OOP-Week3 - Class 2UML-CLass Diagram-Pages
20 pages
Casework Aime
No ratings yet
Casework Aime
5 pages
Day # 18 (2 Past Papers)
No ratings yet
Day # 18 (2 Past Papers)
32 pages
Caterpillar 248b - Loader - Operation Manual - Maintenance PDF
No ratings yet
Caterpillar 248b - Loader - Operation Manual - Maintenance PDF
33 pages
GeoBase NHNC1 Data Model UML EN
No ratings yet
GeoBase NHNC1 Data Model UML EN
19 pages
1588V2/Ptpv2 Synchronization of Alu Nodeb 1588 Design in Ip/Mpls Network
No ratings yet
1588V2/Ptpv2 Synchronization of Alu Nodeb 1588 Design in Ip/Mpls Network
12 pages
Intro To Information Processing
No ratings yet
Intro To Information Processing
27 pages