0% found this document useful (0 votes)

20 views88 pages

DSF Lab Exp Full

The document outlines a series of exercises focused on using Python for data analytics, including installation of packages like NumPy, SciPy, Pandas, and Matplotlib. It provides step-by-step procedures for creating and manipulating arrays, data frames, and various plots, as well as performing statistical calculations such as mean, mode, and standard deviation. Each exercise concludes with a verification of the output, demonstrating successful implementation of the tasks.

Uploaded by

HOD EEE PET Engineering College

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views88 pages

DSF Lab Exp Full

Uploaded by

HOD EEE PET Engineering College

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

EX.

NO: 1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF PYTHON

FOR DATA ANALYTICS

PROCEDURE:
Python pip is the package manager for Python packages. We can use pip to install packages that
do not come with Python. The basic syntax of pip commands in command prompt is:
pip 'arguments'
Step1:

Step2:
Python pip comes pre-installed on 3.4 or older versions of Python. To check whether pip is
installed or not type the below command in the terminal.
pip --version
This command will tell the version of the pip if pip is already installed in the system.

Step3:
Before you upgrade, first let’s get the current pip version by running pip --version
On Windows, to upgrade pip first open the windows command prompt and then run the following
command to update with the latest available version
# Upgrade to latest available version
python -m pip install --upgrade pip

Step4:
Now check the version of pip it will show the updated version of pip

Step5:
use pip install numpy
Pip downloads the NumPy package and notifies you it has been successfully installed.
pip3 install numpy
Pip downloads the NumPy package and notifies you it has been successfully installed.

Step6:
use pip install scipy
Pip downloads the Scipy package and notifies you it has been successfully installed.
pip install scipy
Pip downloads the Scipy package and notifies you it has been successfully installed
Step7:
use pip install pandas
Pip downloads the pandas package and notifies you it has been successfully installed.
pip install pandas
Pip downloads the pandas package and notifies you it has been successfully installed

Step:8
use pip install matplotlib
Pip downloads the matplotlib package and notifies you it has been successfully installed.
pip install matplotlib
Pip downloads the matplotlib package and notifies you it has been successfully installed
Step 9:
Pip intall seaborn

Step10:
Finally Type the command python and import the packages by using import command

Result:
Thus the download, install and explore the features of python for data analytics was
successfully implement
EX.NO: 2 WORKING WITH NUMPY ARRAYS

ALGORITHM:

Step 1: Start the program

Step 2: import numpy as np
Step 3: Define a One /Two/Multidimensional Array
Step 4: Perform file operation using Numpy arrays
Step 5: Display the Output
Step 6: Stop the Program

CODE:
1) To create list
import numpy as np
list=np.array([1,2,3,4,5])
print(list)

2) To create one dimensional array

import numpy as np
list=[1,2,3,4]
sample_array=np.array(list)
print("List in python:",list)
print("Numpy array in python:",sample_array)
3) To create multidimensional array
list_1=[1,2,3,4]
list_2=[5,6,7,8]
list_3=[9,10,11,12,]
sample_array1=np.array([list_1,list_2,list_3])
print("Numpy multidimensional array in python \n",sample_array1)

4) To add two array values

a1=np.arange(1,7).reshape(2,3)
a2=np.arange(8,14).reshape(2,3)
d=np.add(a1,a2)
print(d)

5) To print min,max values

print(np.max(sample_array1))
print(np.min(sample_array1))
6. Used for appending the values to the end of the list
arr1=np.array([1,6,7,34,100])append_val=np.append(arr1, 99)
print(append_val)

7. Its used to printing the range of values

arange_val=np.arange(11)
print(arange_val)

8. Used to printing range of values by start and stop

arange_val_start_and_stop=np.arange(1,10)
print(arange_val_start_and_stop)

9. Used to printing range of values by start, stop and step value

arange_val_start_stop_and_step=np.arange(1,10,2)
print(arange_val_start_stop_and_step)

10. To print the round values of an array

around_array=np.array([3.4, 1.1, 5.5, 6.7])
print(np.around(around_array))

11. To generate random floating samples

rand_int= np.random.rand(4))
print(rand_int)

12. To generate 2 dimensional array

array_2d = np.array([[1,2,3], [4,5,6]])
print(array_2d)

13. To print the shape of the array

print(array_2d.shape)
#Output will be (2,3), which indicates 2 rows and 3 columns

14. To print the array which containing the value 0 only

array_zero_only = np.zeros((5,5),dtype='int') #5 rows and 5 column
print(array_zero_only)

15. To print the mean of the array

array1 = np.array([1,16,4,25,9])
mean_array = np.mean(array1)
print(mean_array)

16. To print the median of the array

array1 = np.array([1,16,4,25,9])
median_array = np.median(array1)
print(median_array)

17. Reshape the array

array1=np.random.randint(15,size=(4,3))
print(“Before Reshape:”,array1)
print(" ")
#Here randomly generate 4x3 integer matrix
array_reshape = array1.reshape(3,4)
print(“After Reshape:” ,array_reshape)

18. Print array based on the condition

array=np.arange(1,11)
filter_val = array[np.where(array>5)]
print(filter_val)

19. Returns True if the two arrays are same

array1=np.array([1,2,3,4,5])
array2=np.array([1,2,3,4,5])
print(np.equal(array1, array2))

20. To repeat the value for the number of times

print_100_five_times = np.repeat('100', 5)
print(print_100_five_times)

21. To print the standard deviation and variance of the array

array1=np.array([1,2,3,4,5,6])
std_of_array1=np.std(array1)
variance_of_array1=np.var(array1)
print(“Standard Deviation of the array is : “, std_of_array1)
print(“Variance of the array is :”, variance_of_array1)

RESULT:

Thus the python program for creating numpy array using different functions has been
done and the outputhas been verified.
Ex.No.3 WORKING WITH PANDAS DATA FRAMES

AIM:

To create a DataFrame using a single list or a list of lists, Locate row, named index, Locate Named
indexes.

CODE 1:
Create a DataFrame can be created using a single list.
import pandas as pd
lst = ['python', 'For', 'first', 'year',
'students', 'interesting',
'programs']
# Calling DataFrame constructor on
listdf = pd.DataFrame(lst)
print(df)

CODE 2:

Locate Row:

import pandas as pd
lst = ['python', 'For', 'first', 'year',
'students', 'interesting',
'programs']

# Calling DataFrame constructor on

listdf = pd.DataFrame(lst)
print(df)
print(df.loc[0]
)

CODE 3:
Named Indexes,Locate Named Indexes
import pandas as pd
lst = {
"list1":['python', 'For', 'first'],
"list2":['students', 'interesting', 'programs']
}
# Calling DataFrame constructor on list
df = pd.DataFrame(lst, index = ["day1", "day2", "day3"])
print(df)
print(df.loc["day2"])
RESULT:

Thus the python program for creating dataframes using different functions has been done and the
outputhas been verified.
Ex.No.4 BASIC PLOTS USING MATPLOTLIB

AIM:

To draw various plots like line plot, Bar Graph, Histogram, Scatter Plot, Area
Plotand Pie Chart.

CODE 1

LINE PLOT:
from matplotlib import pyplot as plt

#Plotting to our canvas plt.plot([1,2,3],[4,5,1])

#Showing what we

plottedplt.show()

CODE 2:
from matplotlib import pyplot as plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],
label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],
label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance
(kms)')
plt.title('Information')
plt.show()

CODE 4:
HISTOGRAM
import matplotlib.pyplot as
pltpopulation_age =
[22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of
people')plt.title('Histogram')
plt.show()

CODE 5:
AREA PLOT
mport matplotlib.pyplot as
pltdays = [1,2,3,4,5]
Output –
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]

plt.plot([],[],color='m', label='Sleeping', linewidth=5)

plt.plot([],[],color='c', label='Eating', linewidth=5)
plt.plot([],[],color='r', label='Working', linewidth=5)
plt.plot([],[],color='k', label='Playing', linewidth=5)

plt.stackplot(days, sleeping,eating,working,playing, colors=['m','c','r','k'])

plt.xlabel('x')
plt.ylabel('y')
plt.title('Stack
Plot')plt.legend()
plt.show()

CODE 6:

PIE CHART
import matplotlib.pyplot as plt
days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
slices = [7,2,2,13]
activities = ['sleeping','eating','working','playing']
cols = ['c','m','r','b']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.1,0,0)
,
autopct='%1.1f%%'
)plt.title('Pie Plot')
plt.show()
RESULT: Thus the program for creating different plots using matplotlib has been done and the
output has been
lOM oARc PSD|37 23 9 59 6

Ex.no 5 STATISTICAL AND PROBABILITY MEASURES

5. a) Frequency distributions

Aim:

To Count the frequency of occurrence of a word in a body of text is often needed during text
processing.

ALGORITHM

Step 1: Start the Program

Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of
text

Step 5: Print the result

Step 6: Stop the process

Program:
from nltk.tokenize import word_tokenize
nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt") token =

word_tokenize(sample)
wlist = []

for i in range(50): wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]

print("Pairs\n" + str(zip(token, wordfreq)))
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed during
text processing and Conditional Frequency Distribution program using python was successfully
completed.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5. b) MEAN, MODE, STANDARD DEVIATION

AIM:
To write a python program to calculate the Mean , Mode, Standard Deviation.

ALGORITHM:
Step 1: Start the Program
Step 2: Write the code to calculate
Mean, Mode ,standard deviation.

Step 3: Print the result

Step 4: Stop the process

1. Mean:

The mean is the average of all numbers and is sometimes called the arithmetic mean. This code calculates
Mean or Average of a list containing numbers:

CODE
# mean of elements

# list of elements to calculate mean

n_num = [1, 2, 3, 4, 5]
n = len(n_num)

get_sum = sum(n_num)
mean = get_sum / n

print("Mean / Average is: " + str(mean))

2 Mode :
The mode is the number that occurs most often within a set of numbers. This code calculates Mode of a list
containing numbers:
# Python program to print
# mode of elements
from collections import Counter

# list of elements to calculate mode

n_num = [1, 2, 3, 4, 5, 5]
n = len(n_num)

data = Counter(n_num)
get_mode = dict(data)
mode = [k for k, v in get_mode.items() if v == max(list(data.values()))]

if len(mode) == n:
get_mode = "No mode found"
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

else:
get_mode = "Mode is / are: " + ', '.join(map(str, mode))

print(get_mode)

3. Standard Deviation
is a measure of spread in Statistics. It is used to quantify the measure of spread, variation of a set of data
values. It is very much similar to variance, gives the measure of deviation whereas variance provides the
squared value.
A low measure of Standard Deviation indicates that the data are less spread out, whereas a high value of
Standard Deviation shows that the data in a set are spread apart from their mean average values.
# Python code to demonstrate stdev() function

# importing Statistics module

import statistics

# creating a simple data - set

sample = [1, 2, 3, 4, 5]

# Prints standard deviation

# xbar is set to default value of 1 print("Standard Deviation of sample is % s "
% (statistics.stdev(sample)))

Result:

Thus the python code of Mean, Mode, and Standard Deviation was successfully calculated.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5. c) VARIABILITY

Aim:
To write a python program to calculate the variance.

ALGORITHM

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process
Program:
# Python code to demonstrate variance() #
function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values from

fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

# Print the variance of each samples

print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Result:

Thus the python program to calculate the variance was successfully implemented.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5.d) NORMAL CURVES

a) Aim:
To create a normal curve using python program.

ALGORITHM

Step 1: Start the Program

Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn Step
4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

Program:

# import required libraries

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution

data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
lOM oARc PSD|37 23 9 59 6

Output:
lOM oARc PSD|37 23 9 59 6

Result :

Thus the a normal curve using python program.

lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5.e) CORRELATION AND SCATTER PLOTS

AIM
To write a python program for correlation with scatter plot

ALGORITHM

Step 1: Start the Program Step

2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

ALGORITHM:

Program:

# Scatterplot and Correlations

# Data

x-pp random randn(100)

yl=x*5+9
y2=-5°x
y3=no_random.randn(100)

#Plot

plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})

plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}) plt
scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)}) plt
scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})

# Plot

plt titlef('Scatterplot and Correlations')

plt(legend)
plt(show)
lOM oARc PSD|37 23 9 59 6

Output
lOM oARc PSD|37 23 9 59 6

Result:

Thus the Correlation and scatter plots using python program was successfully completed.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5.f) CORRELATION COEFFICIENT

Aim:
To write a python program to compute correlation coefficient.

ALGORITHM

Step 1: Start the Program Step

2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process

Program:

# Python Program to find correlation coefficient.

import math

# function that returns correlation coefficient.

def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].

sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

i=i+1

# use formula for calculating correlation

# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.

n = len(X)

# Function call to correlationCoefficient.

print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Result:

Thus the computation for correlation coefficient was successfully completed

lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5.g) REGRESSION

Aim:
To write a python program for Simple Linear Regression

ALGORITHM

Step 1: Start the Program

Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x Step
5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

Program:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

lOM oARc PSD|37 23 9 59 6

Graph:
lOM oARc PSD|37 23 9 59 6

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if name == " main

": main()

Result:

Thus the computation for Simple Linear Regression was successfully completed.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Ex.no : 6 a) USE THE STANDARD BENCHMARK DATA SET FOR PERFORMING THE
FOLLOWING UNIVARIATE ANALYSIS:
UNIVARIATE ANALYSIS:
AIM:
To write a python program for univeariate analysis on UCI datasets

ALGORITHM:
Step 1: Start the program
Step 2: Write the coding
Step 3: calculate the Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and
Kurtosis.
Step 4: Stop the program

Frequency

import pandas as pd
import numpy as np
import statistics as st

# Load the data

df = pd.read_csv("data_desc.csv")
print(df.shape)
print(df.info())

<class'pandas.core.frame.DataFrame> RangeIndex: 600 entries, 0 to599 Data columns (total 10columns):

Marital_status 600 non-null
object Dependents
600 non-null
int64 Is_graduate
600 non-null
object Income 600 non-null
int64 Loan_amount
600 non-null
int64 Term_months
600 non-null
int64 Credit_score
600 non-null
object approval_status
600 non-null
object Age 600 non-null int64
Sex 600 non-null
object dtypes: int64(5),
object(5) memory usage:
47.0+ KB
None

Mean

df.mean()
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

6 dtype: float64
It is also possible to calculate the mean of a particular variable in a data, as shown below, where we
calculate the mean of the variables 'Age' and 'Income'.

print(df.loc[:,'Age'].mean())
print(df.loc[:,'Income'].mean())

It is also possible to calculate the mean of the rows by specifying the (axis = 1) argument. The code
below calculates the mean of the first five rows.

df.mean(axis = 1)[0:5]

Median

df.median()

Mode
Mode

df.mode()

Variance

df.var()

Standard Deviation

df.std()

Skewness and Kurtosis.

Skewness

print(df.skew())

df.describe()
df.describe(include='all')

Result:

Thus the Univariate and Multiple Regression Analysis using the diabetes data setfrom
UCI and Pima Indians Diabetes data set was completed and verified successfully.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

5.b) BIVARIATE ANALYSIS:

AIM:

To write a program for bivariate Analysis

ALGORITHM:

Step 1: Start the program

Step 2: Write the coding
Step 3: Bivariate Analysis: Linear and logistic regression modelling.
Step 4: Stop the program

Linear Regression

import matplotlib.pyplot as plt

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

Logistic Regression
import numpy
from sklearn import linear_model

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

def logit2prob(logr, X):

log_odds = logr.coef_ * X + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)
print(logit2prob(logr, X))
lOM oARc PSD|37 23 9 59 6

OUTPUT:
lOM oARc PSD|37 23 9 59 6

Multiple Regression analysis

Multiple regression works by considering the values of the available multiple independent
variables and predicting the value of one dependent variable.

import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm

data = {'year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,20
16,2016,2016,2016,2016,2016],
'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'interest_rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'unemployment_rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'index_price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,
876,822,704,719]
}

df = pd.DataFrame(data)

x = df[['interest_rate','unemployment_rate']]
y = df['index_price']

# with sklearn
regr = linear_model.LinearRegression()
regr.fit(x, y)

print('Intercept: \n', regr.intercept_)

print('Coefficients: \n', regr.coef_)

# with statsmodels
x = sm.add_constant(x) # adding a constant

model = sm.OLS(y, x).fit()

predictions = model.predict(x)

print_model = model.summary()
print(print_model)
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Result:

Thus the Bivariate and Multiple Regression Analysis using the diabetes data set
from UCI and Pima Indians Diabetes data set was completed and verified successfully.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

EX.NO 7. APPLY SUPERVISED LEARNING ALGORITHMS AND UNSUPERVISED

LEARNING ALGORITHMS ON IRIS DATASET.
AIM: To write a program for apply supervised learning algorithms and unsupervised learning algorithms
on iris dataset.
ALGORITHM:

WHAT IS SUPERVISED LEARNING?

Supervised learning is a machine learning task where an algorithm is trained to find patterns using a
dataset. The supervised learning algorithm uses this training to make input-output inferences on future
datasets. In the same way a teacher (supervisor) would give a student homework to learn and grow
knowledge, supervised learning gives algorithms datasets so it too can learn and make inferences.

pip install pandas

pip install matplotlib
pip install scikit-learn

from sklearn import datasets

import pandas as pd
import matplotlib.pyplot as plt

# Loading IRIS dataset from scikit-learn object into iris variable.

iris = datasets.load_iris()

# Prints the type/type object of iris

print(type(iris))
# <class 'sklearn.datasets.base.Bunch'>

# prints the dictionary keys of iris data

print(iris.keys())

# prints the type/type object of given attributes

print(type(iris.data), type(iris.target))

# prints the no of rows and columns in the dataset

print(iris.data.shape)

# prints the target set of the data

print(iris.target_names)

# Load iris training dataset

X = iris.data

# Load iris target set

Y = iris.target

# Convert datasets' type into dataframe

df = pd.DataFrame(X, columns=iris.feature_names)

# Print the first five tuples of dataframe.

print(df.head())
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

from sklearn import datasets

from sklearn.neighbors import KNeighborsClassifier

# Load iris dataset from sklearn

iris = datasets.load_iris()

# Declare an of the KNN classifier class with the value with neighbors.
knn = KNeighborsClassifier(n_neighbors=6)

# Fit the model with training data and target values

knn.fit(iris['data'], iris['target'])

# Provide data whose class labels are to be predicted

X=[
[5.9, 1.0, 5.1, 1.8],
[3.4, 2.0, 1.1, 4.8],
]

# Prints the data provided

print(X)

# Store predicted class labels of X

prediction = knn.predict(X)

# Prints the predicted class labels of X

print(prediction)
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
import numpy as np

# Load the diabetes dataset

diabetes = datasets.load_diabetes()

# Use only one feature for training

diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets

diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(diabetes_X_train, diabetes_y_train)

# Input data
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

print('Input Values')
print(diabetes_X_test)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# Predicted Data
print("Predicted Output Values")
print(diabetes_y_pred)

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test, color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='red', linewidth=1)

plt.show()

Unsupervised learning is a class of machine learning (ML) techniques used to find patterns indata. The
data given to unsupervised algorithms is not labelled, which means only the input variables (x) are
given with no corresponding output variables. In unsupervised learning, the algorithms are left to
discover interesting structures in the data on their own.
On GitHub: iris_dataset.py

# Importing Modules
from sklearn import datasets
import matplotlib.pyplot as plt

# Loading dataset
iris_df = datasets.load_iris()

# Available methods on dataset

print(dir(iris_df))

# Features
print(iris_df.feature_names)

# Targets
print(iris_df.target)

# Target Names
print(iris_df.target_names)

label = {0: 'red', 1: 'blue', 2: 'green'}

# Dataset Slicing
x_axis = iris_df.data[:, 0] # Sepal Length
y_axis = iris_df.data[:, 2] # Sepal Width

# Plotting
plt.scatter(x_axis, y_axis, c=iris_df.target)
plt.show()
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

['DESCR', 'data', 'feature_names', 'target', 'target_names']

K-Means Clustering in Python

K-means implementation in Python on GitHub: clustering_iris.py

# Importing Modules
from sklearn import datasets
from sklearn.cluster import KMeans

# Loading dataset
iris_df = datasets.load_iris()

# Declaring Model
model = KMeans(n_clusters=3)

# Fitting Model
model.fit(iris_df.data)

# Predicitng a single input

predicted_label = model.predict([[7.2, 3.5, 0.8, 1.6]])

# Prediction on the entire data

all_predictions = model.predict(iris_df.data)

# Printing Predictions
print(predicted_label)
print(all_predictions)
[0]
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Result:

Thus the To write a program for apply supervised learning algorithms and unsupervisedlearning algorithms on iris
dataset
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

EX.NO:8 APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS ON UCI DATA

SETS

To load and quickly visualize the Multiple Features Dataset [1] from the UCI repository, which
is available in mvlearn. This dataset can be a good tool for analyzing the effectiveness of
multiview algorithms. It contains 6 views of handwritten digit images, thus allowing for analysis
of multiview algorithms in multiclass or unsupervised tasks.

a. Normal curves

A probability distribution is a statistical function that describes the likelihood of obta ining the
possible values that a random variable can take. By this, we mean the range of values that a
parameter can take when we randomly pick up values from it. If we were asked to pick up 1 adult
randomly and asked what his/her (assuming gender does not affect height) height would be?
There’s no way to know what the height will be. But if we have the distribution of heights ofadults
in the city, we can bet on the most probable outcome.A Normal Distribution is also known as a
Gaussian distribution or famously Bell Curve. People use both words interchangeably, but it means
the same thing. It is a continuous probability distribution.

Code:

import numpy as np

import matplotlib.pyplot as plt

# Creating a series of data of in range

of 1-50. x = np.linspace(1,50,200)

#Creating a Function.

def normal_dist(x , mean , sd):

prob_density = (np.pisd) np.exp(-0.5*((x-

mean)/sd)**2) return prob_density

#Calculate mean and Standard

deviation. mean = np.mean(x)

sd = np.std(x)
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

#Apply function to the

data. pdf =

normal_dist(x,mean,sd)

#Plotting the Results

plt.plot(x,pdf , color =

'red') plt.xlabel('Data

points')

plt.ylabel('Probability Density')

a. Density and contour plots

Contour plots also called level plots are a tool for doing multivariate analysis and visualizing 3-D
plots in 2-D space. If we consider X and Y as our variables we want to plot then the response Z
will be plotted as slices on the X-Y plane due to which contours are sometimes referred as Z-
slices or iso-response.

Contour plots are widely used to visualize density, altitudes or heights of the mountain as well as
in the meteorological department. Due to such wide usage matplotlib.pyplot provides a method
contour to make it easy for us to draw contour plots.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Code:

import matplotlib.pyplot as

plt import numpy as np

feature_x = np.arange(0, 50, 2)

feature_y = np.arange(0,

50, 3) # Creating 2-D grid

of features

[X, Y] = np.meshgrid(feature_x,

feature_y) fig, ax = plt.subplots(1, 1)

Z = np.cos(X / 2) +

np.sin(Y / 4) # plots

contour lines ax.contour(X,

Y, Z)

ax.set_title('Contour

Plot')

ax.set_xlabel('feature

_x')

ax.set_ylabel('feature

_y') plt.show()
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

b. Correlation and scatter plots

Correlation means an association, It is a measure of the extent to which two variables are related.

1. Positive Correlation: When two variables increase together and decrease together. They are
positively correlated. ‘1’ is a perfect positive correlation. For example – demand and profit are
positively correlated the more the demand for the product, the more profit hence positive
correlation.

2. Negative Correlation: When one variable increases and the other variable decreases together
and vice-versa. They are negatively correlated. For example, If the distance between magnet
increases their attraction decreases, and vice- versa. Hence, a negative correlation. ‘-1’ is no
correlation

3. Zero Correlation( No Correlation): When two variables don’t seem to be linked at all. ‘0’ is a
perfect negative correlation. For Example, the amount of tea you take and level of intelligence.

Code:

import pandas as pd

con =

pd.read_csv('concrete.csv')

con
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

list(con.columns)

con.head()

con['cement'] = con['cement'].astype('category')

con.describe(include='category')

import seaborn as sns

sns.scatterplot(x="water", y="coarseagg", data=con);

ax = sns.scatterplot(x="water", y="coarseagg",

data=con) ax.set_title("Concrete Strength vs. Fly

ash") ax.set_xlabel("coarseagg");

sns.lmplot(x="water", y="coarseagg", data=con);

a. Histograms:

A histogram is basically used to represent data provided in a form of some groups.It is accurate
method for the graphical representation of numerical data distribution.It is a type of bar plot where
X-axis represents the bin ranges while Y-axis gives information about frequency.

Creating a Histogram

To create a histogram the first step is to create bin of the ranges, then distribute the whole range
of the values into a series of intervals, and count the values which fall into each of the
intervals.Bins are clearly identified as consecutive, non-overlapping intervals of variables. The
matplotlib.pyplot.hist() function is used to compute and create histogram of x.
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

Code:

from matplotlib import pyplot as plt

import numpy as np

# Creating dataset

a = np.array([22, 87, 5, 43, 56,

73, 55, 54, 11,

20, 51, 5, 79, 31,

27])

# Creating histogram

fig, ax = plt.subplots(figsize =(10, 7))

ax.hist(a, bins = [0, 25, 50, 75, 100])

# Show plot

plt.show()

Code:

import matplotlib.pyplot as plt

import numpy as np

from matplotlib import colors

from matplotlib.ticker import PercentFormatter

# Creating dataset

np.random.seed(23685752)

N_points = 10000

n_bins = 20
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

# Creating distribution
x = np.random.randn(N_points)

y = .8 ** x + np.random.randn(10000) + 25

# Creating histogram

fig, axs = plt.subplots(1, 1,figsize =(10, 7),tight_layout = True)

axs.hist(x, bins = n_bins)

# Show plot

plt.show()

a. Three dimensional plotting

Matplotlib was introduced keeping in mind, only two-dimensional plotting. But at the time when
the release of 1.0 occurred, the 3d utilities were developed upon the 2d and thus, we have 3d
implementation of data available today! The 3d plots are enabled by importing the mplot3d toolkit.
In this article, we will deal with the 3d plots using matplotlib.

Code:

from mpl_toolkits import mplot3d

Import numpy as

import matplotlib.pyplot as plt fig

= plt.figure()

# syntax for 3-D projection

ax = plt.axes(projection ='3d')

# defining axes

z = np.linspace(0, 1, 100)

x = z * np.sin(25 * z)

y = z * np.cos(25 * z)

c=x+y

ax.scatter(x, y, z, c = c)
lOM oARc PSD|37 23 9 59 6
lOM oARc PSD|37 23 9 59 6

# syntax for plotting

ax.set_title('3d Scatter plot')

plt.show()

Result:

Thus the apply and explore various plotting functions on uci data
lOM oARc PSD|37 23 9 59 6

Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Fds Record
No ratings yet
Fds Record
69 pages
697e9176-7141-4407-ac59-183e04ddf458
No ratings yet
697e9176-7141-4407-ac59-183e04ddf458
44 pages
Lab Manual Data Science
No ratings yet
Lab Manual Data Science
24 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Data Science Lab Manual Full
No ratings yet
Data Science Lab Manual Full
47 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Unit 5 Python Packages 240127 185930
No ratings yet
Unit 5 Python Packages 240127 185930
34 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
FDS Lab Manual (Print)
No ratings yet
FDS Lab Manual (Print)
43 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Module 4
No ratings yet
Module 4
4 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Ai Practical File Yuvraj - 070403
No ratings yet
Ai Practical File Yuvraj - 070403
28 pages
OCS353-Data Science Fundamentals Manual 1 - PDF
No ratings yet
OCS353-Data Science Fundamentals Manual 1 - PDF
6 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Roadmap
No ratings yet
Roadmap
27 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Python
No ratings yet
Python
20 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
FDS Lab Manual-1
No ratings yet
FDS Lab Manual-1
51 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Unit 5-Python Packages 240127 185930
No ratings yet
Unit 5-Python Packages 240127 185930
34 pages
Programming Notes 2
No ratings yet
Programming Notes 2
9 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
54 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Answers 1
No ratings yet
Answers 1
17 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
PP Unit-5 Notes
No ratings yet
PP Unit-5 Notes
15 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Manual
No ratings yet
Manual
21 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Unit5 NumPy Pandas Notes
No ratings yet
Unit5 NumPy Pandas Notes
90 pages
FDS Record
No ratings yet
FDS Record
59 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Incln UseCumulativeSums 1994
No ratings yet
Incln UseCumulativeSums 1994
12 pages
Ifet College of Engineering (An Autonomous Institution) Department of Electronics and Communication Engineering Question Bank
No ratings yet
Ifet College of Engineering (An Autonomous Institution) Department of Electronics and Communication Engineering Question Bank
3 pages
Assignment 02
No ratings yet
Assignment 02
1 page
Probability and Statistics: Progress Test 2
No ratings yet
Probability and Statistics: Progress Test 2
4 pages
Section 6 Slides PDF
No ratings yet
Section 6 Slides PDF
362 pages
Analysis For Business Chi-Square Test
No ratings yet
Analysis For Business Chi-Square Test
28 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Biomedical Literature Evaluation: Brief Notes On
No ratings yet
Biomedical Literature Evaluation: Brief Notes On
24 pages
STAT 713 Mathematical Statistics Ii: Lecture Notes
No ratings yet
STAT 713 Mathematical Statistics Ii: Lecture Notes
152 pages
Existing Data Based Research
No ratings yet
Existing Data Based Research
2 pages
Attribute (Discrete) Gage R & R Effectiveness: Instructions
No ratings yet
Attribute (Discrete) Gage R & R Effectiveness: Instructions
9 pages
DP2 AI SL Formative 3
No ratings yet
DP2 AI SL Formative 3
3 pages
ML CHeat Sheet
No ratings yet
ML CHeat Sheet
3 pages
Areas Under The Normal Curve Application of Normal Distribution
No ratings yet
Areas Under The Normal Curve Application of Normal Distribution
5 pages
Validation by Design: The Statistical Handbook For Pharmaceutical Process Validation
No ratings yet
Validation by Design: The Statistical Handbook For Pharmaceutical Process Validation
18 pages
The Nature of Regression Analysis
No ratings yet
The Nature of Regression Analysis
20 pages
Research Sampling Designs: (PA 298 Research For Social Science)
No ratings yet
Research Sampling Designs: (PA 298 Research For Social Science)
19 pages
Chapter 4 MMW Part 3
No ratings yet
Chapter 4 MMW Part 3
64 pages
TSCS Week5 Trends
No ratings yet
TSCS Week5 Trends
19 pages
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
No ratings yet
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
10 pages
Statistics Summer Course
No ratings yet
Statistics Summer Course
49 pages
Lesson Two
No ratings yet
Lesson Two
66 pages
CLUSTER ANALYSIS Devashish
No ratings yet
CLUSTER ANALYSIS Devashish
4 pages
Q4 Moderation Analysis
No ratings yet
Q4 Moderation Analysis
20 pages
Chapter7 Varying Probability Sampling
No ratings yet
Chapter7 Varying Probability Sampling
32 pages
HSTS423 - Unit 5 Multicolinearity
No ratings yet
HSTS423 - Unit 5 Multicolinearity
12 pages
MSA For Continuous Data
No ratings yet
MSA For Continuous Data
8 pages
All of My Output For 351
No ratings yet
All of My Output For 351
46 pages
Chapter - 6.pdf Filename UTF-8''Chapter 6
No ratings yet
Chapter - 6.pdf Filename UTF-8''Chapter 6
34 pages