0% found this document useful (0 votes)
76 views

CS3361-Data Science Laboratory Manual

datascience ppt

Uploaded by

anithakannan2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

CS3361-Data Science Laboratory Manual

datascience ppt

Uploaded by

anithakannan2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Salem college of Engineering and Technology Data Science Laboratory

Salem College
of Engineering and Technology
NH-68, Salem-Attur Main Road, Mettupatty, Perumapalayam, Selliamman
Nagar, Salem, Tamil Nadu 636111

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERIRNG


AND
DEPARTMENT OF INFORMATION TECHNOLOGY

ACADEMIC YEAR : 2023-2024

Lab Manual

CS3361- Data Science Laboratory

(III semester)
Regulation 2021

Prepared By

Ms.S.Namagiri, AP/CSE

Ms.A.Merlin, AP/CSE
1
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

CS3361 DATA SCIENCE LABORATORY LTPC


0 0 4 2 COURSE
OBJECTIVES:
 To understand the python libraries for data science
 To understand the basic Statistical and Probability measures for data science.
 To learn descriptive analytics on the benchmark data sets.
 To apply correlation and regression analytics on standard data sets.
 To present and interpret data using visualization packages in Python.
LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:
a.Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a.Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
LIST OF EQUIPMENTS :(30 Students per Batch)

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

2
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

LIST OF EXPERIMENTS

Ex.No Name of the Experiments

Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
1
packages
2 Working with Numpy arrays

3 Working with Pandas data frames

4a Reading data from text files, Excel and the web

4b Descriptive analytics on the Iris data set

5.1.a Univariate Analysis - Pima Indians Diabetes for Diabetic Patients

5.1.b Univariate Analysis - Pima Indians Diabetes for Non Diabetic Patients

5.2.a Bivariate Analysis - Linear Regression -Pima Indians Diabetes for Diabetic Patients

5.2b Bivariate Analysis - Linear Regression -Pima Indians Diabetes for Non Diabetic Patients

5.3 Bivariate Analysis -Logistic regression -Pima Indians Diabetes

5.4 Multiple Regression analysis -Pima Indians Diabetes

5.5 Result Analysis

6.1 Normal curves - Iris data set

6.2 Density and contour plots - Pima Indians Diabetes data set

6.3.a Correlation-Iris data set

6.3.b Scatter plot - Iris data set

6.4 Histogram -Iris data set

6.5 Three dimensional plotting

7 Visualizing Geographic Data with Basemap

3
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Exno: 1 DOWNLOAD INSTALL AND EXPLORE THE FEATUE


Date: NUMY,SCIPY,JUPYER,STATS AND MODELS AND PANDAS PACKAGES

AIM:-
To download, install and explore the features of numpy ,scipy, jupyter, stats models and pandas packages .
DESCRIPTION:-
1 .Download the package of numpy, scipy, jupyter, stats model and pandas packages download numpy
from web.
2. Check the python version before install numpy because most of as having pre installed.
Check the version of python 3, run the command python_ 3v.
3. Install the package in windows. Install pip, which is a package manager for installing and managing
python software packages. The easiest way t installs numpy is by using pip.
4. Install numpy with pp set up we can use its command line for installing numpy with python by tying.
FEATUES:-
1 .NUMPY
 Numpy stands for numerical python.
 Numpy is one of the most commonly used packages for specifying computing in python.
 High performance zero-dimensional array object
 It contains tools for integrating code from C/C++ &Fort on .
 It contains an multidimensional contains for generic data.
 Additional linear algebra, Fouriertransforms and random number capabilities.
 It consists of broad casting functions.

2. SCIPY:
 Scipy stands for scientific python
 Scipy is a scientific computation library that uses numpy underneath.
 It provides more utility functions for optimization, stats and signal processing.
 Scipy is a python library that is useful in solving many mathematical equations and algorithm.
 It is used on the top of numpy library that gives more extension of finding scientificalmathematical
formulae like matrix rule, inverse, polynomial equations, LU Decomposition, etc,…..

4
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

3. JUPYTER:-
 Jupyter is a loose acronym meaning Julie, python, and R.
 These programming languages were the first target languages of the jupyter application.
 The main components of the whole environment are on the end pne hand, the note books
themselves and application,
 Jupyter notebook is an open-source, web-based interactive environment, which allows you to
create and share documents that contain live code, mathematical equations, graphics , maps, plots,
visualizations and narrative text.

5
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

4. STATS MODELS:-
 Stats models stands for statistical models.Stats models is a python package that allows users
to explore data, estimate statisticalmodels, and perform statistical tests.
 An extensive list of descriptive statistics, statistical tests, plotting functions and result
statistics are available for different types of data and each estimator.

6
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

7
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

5. PANDAS:-
 Pandas stands for “Python Analysis Library”.
 Fast and efficient Data frame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and evaluating of data sets.

8
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

RESULT:-
Thus the above packages downloaded, installed and features are studied.

9
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:2 WORKING WITH NUMPY ARRAY

Date:

AIM:-

To write a python program using numpy array.

ALGORITHM:-

1. Import numpy package.

2. Create a array from list with type float using array() and print the same.

3. Create a array from tuple using array () and print the same.

4. Creating a 3*4 array with all zeroes () and print the same.

5. Create a withstand value array of complex type using full () and print the same.

6. Create a sequences of integers with steps from 0 to 30 using array () and print the same.

7. Reshaping 3*4 array to 2*2*3 array reshape () and print the same.

8. Create a flatten array using flatten () and print the same.

9. Create a merge of arrays using concatenate () and print the same.

10. Create a merge splitten array using array split () and print the same.

11. To sort a given array using sort () and print the same.

12. To search the given key value in the given array by using where() and print the same.

PROGRAM :

import numpy as np

#1.Creating array from list with type float

a=np. Array([[1,2,4],[5,8,7]],dtype='float')

print ("\n 1.Array created using passed list:\n",a)

#2.Creating array from tuple

b=np.array((1,3,2))

print("\n 2.Array created using passed tuple:\n",b)

10
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

#3.Creating a 3*4 array with all zeros

c=np.zeros((3,4))

print("\n 3.An array initialized with all zeros:\n",c)

#4.Create a constant value array of complex type


d=np.full((3,3),6,dtype='complex')
print("\n 4.An array initialized with all 6's with type complex:\n",d)
#5.Create a sequence of integers from 0 to 30 with steps of 5
e=np.arange(0,30,5)
print("\n 5.A sequence array with steps of 5:\n",e)
#6.Reshaping 3*4 array to 2*2*3 array
f=np.array([[1,2,3,4],[5,2,4,2],[1,2,0,1]])
fnew=f.reshape(2,2,3)
print("\n 6.Old array\n",f)
print("\n New array\n",fnew)
#7.Flatten array
g=np.array([[1,2,3],[4,5,6]])
gflat=g.flatten()
print("\n 7.Old array\n",g)
print("\n Flatten array\n",gflat)
#8.Join array
arr1=np.array([1,2,3])
arr2=np.array([4,5,6])
arr=np.concatenate((arr1,arr2))
print("\n 8.Joined array\n",arr)
#9.Split
h=np.array([1,2,3,4,5,6])
hnew=np.array_split(h,3)
print("\n 9.Splited array\n",hnew)
#10.Sort
i=np.array([3,2,0,1])
print("\n 10.Sorted 1D array:\n",np.sort(i))
j=np.array([[3,2,4],[5,0,1]])
11
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print("\n 10.Sorted 2D array:\n",np.sort(j))


#11.Search
k=np.array([1,2,3,4,5,4,4])
knew=np.where(k==4)
print("\n 11.Searched array position:\n",knew)

OUTPUT :
1.Array created using passed list:
[[1. 2. 4.]
[5. 8. 7.]]

2.Array created using passed tuple:


[1 3 2]

3. An array initialized with all zeros:


[[0.0. 0.0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
4.An array initialized with all 6's with type complex:
[[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]]
5.A sequence array with steps of 5:
[ 0 5 10 15 20 25]
6.Old array
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
New array
[[[1 2 3]

12
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

[4 5 2]]
[[4 2 1]
[2 0 1]]]
7.Old array
[[1 2 3]
[4 5 6]]
Flatten array
[1 2 3 4 5 6]
8.Joined array
[1 2 3 4 5 6]
9. Spitted array
[array([1, 2]), array([3, 4]), array([5, 6])]
10.Sorted 1D array:
[0 1 2 3]
10.Sorted 2D array:
[[2 3 4]
[0 1 5]]
11.Searched array position:
(array([3, 5, 6], dtype=int64),)

RESULT:-
Thus the above program is executed successfully .

13
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:3 WORKING WITH PANDAS DATA FRAMES

Date:

AIM:-

To write a python program using Pandas data frames.

ALGORITHM:-

1.Start the program.

2.Import pandas package.

3.Create a two different dictionaries for employee data.

4.Convert the dictionaries into data frames.

5.Using concat() method to join the two data frames into single data frame .

6.Stop the program.

PROGRAM:

import pandas as pd

#Define a dictionary containing employee data

data1={'Name':['Jai','Princi','Gaurav','Anuj'],

'Age':[27,24,22,32],

'Address':['Nagpur','Kanpur','Allahabad','Kannuaj'],

'Qualification':['M.sc','MA','MCA','P.hd']}

#Define a dictionary containing employee data

data2={'Name':['Abhi','Ayushi','Dhiraj','Hitesh'],

'Age':[17,14,12,52],

'Address':['Nagpur','Kanpur','Allahabad','Kannuaj'],

'Qualification':['B.tech','BA','B.com','B.hons']}

#Convert the dictionary into Data frame

14
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

df=pd.DataFrame(data1, index=[0,1,2,3])

#Convert the dictionary into Dataframe

df1=pd.DataFrame(data2,index=[4,5,6,7])

print ("DataFrame1\n\n",df,"\n\n","DataFrame2\n\n",df1,"\n\n")

#Using a concat() method

frames=[df,df1]

res1=pd.concat(frames)

print ("The concatenating Dataframes 1 and 2 \n\n",res1)

OUTPUT:

DataFrame 1

Name Age Address Qualification

0 Jai 27 Nagpur M.sc

1 Princi 24 Kanpur MA

2 Gaurav 22 Allahabad MCA

3 Anuj 32 Kannuaj P.hd

DataFrame 2

Name Age Address Qualification

4 Abhi 17 Nagpur B.Tech

5 Ayushi 14 Kanpur BA

6 Dhiraj 12 Allahabad B.com

7 Hitesh 52 Kannuaj B.hons

The concatenating Dataframes 1 and 2

Name Age Address Qualification

0 Jai 27 Nagpur M.sc

1 Princi 24 Kanpur MA
15
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

2 Gaurav 22 Allahabad MCA

3 Anuj 32 Kannuaj P.hd

4 Abhi 17 Nagpur B.Tech

5 Ayushi 14 Kanpur BA

6 Dhiraj 12 Allahabad B.com

7 Hitesh 52 Kannuaj B.hons

RESULT:-

Thus the above program is executed successfully.

16
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Exno: 4a READING DATA FROM TEXT FILE, EXCEL AND WEB

Date:

AIM:-

To write a python program for reading data from text file, excel file and web.

ALGORITHM:-

1. Start the program.

2. Import pandas package.

3. Create hello.txt test.xlsx files and

4. Save to the same directory.

5. Create hello1.txt and save to the ‘D’ directory.

6. Using open(),read_excel(),read() method to read the corresponding files.

7. Stop the program.

PROGRAM :

# Reading text files

import pandas as pd

f=open("file.txt","r")

print("Reading text file")

print (f.read())

#Reading excel files

df=pd.read_excel('test.xlsx',sheet_name='Employee')

print("Reading excel files")

print (df)

17
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

#Reading file from web or other source

f=open("D:\\cse\test.txt","r")

print("Reading file from web or another location")

print(f.read())

INPUT:-

18
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

19
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

Reading text file

Hii ! How ar Reading text file

Hii ! How are you ?

Reading excel files

S.no Name

0 1 Steve

1 2 Andrea

2 3 Mike

3 4 John

4 5 Max

5 6 Emma

Reading file from web or another location

HI! WELCOME TO FODS LAB

RESULT:-

Thus the above program is executed successfully.


20
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:4b EXPLORATORY DESCRIPTIVE ANALYSIS ON IRIS DATA SET


Date:
AIM:-
To write a python program for the exploratory descriptive analysis on Iris data set.
ALGORITHM:-
1. Start the program
2. Import pandas package.
3. Read Iris.csv file.
4. Perform Exploratory Descriptive Analysis on the iris data get by using
head()
shape()
info()
describe()
value_counts
drop_duplicate() methods
5. Stop the program.
PROGRAM:-
import pandas as pd
#reading Iris File
df=pd.read_csv(“Irisdata.csv”)

#1.print top five rows


print(“\nfirst 5 rows\n”)
print(df.head())

#2.print number of rows and columns


print(“\n rows and columns\n”)
print(df.shape)

#3.print column and their datatypes


print(“\ncolumn and their datatypes\n”)
print(df.info())

#4.print statistical values


print(“\n statistical summary\n”)
print(df.describe())

#5.checking duplicates
print(“\n checking duplicates\n”)
data=df.drop_duplicates(subset=”variety”)
print(data)

#6.value counts
print(“\n variety counts \n”)
print(df.value_counts(“variety”))

21
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Iris data set:-


s.no sepal.length sepal.width petal.length petal.width variety
1 5.1 3.5 1.4 0.2 Setosa
2 4.9 3 1.4 0.2 Setosa
3 4.7 3.2 1.3 0.2 Setosa
4 4.6 3.1 1.5 0.2 Setosa
5 5 3.6 1.4 0.2 Setosa
6 5.4 3.9 1.7 0.4 Setosa
7 4.6 3.4 1.4 0.3 Setosa
8 5 3.4 1.5 0.2 Setosa
9 4.4 2.9 1.4 0.2 Setosa
10 4.9 3.1 1.5 0.1 Setosa
11 5.4 3.7 1.5 0.2 Setosa
12 4.8 3.4 1.6 0.2 Setosa
13 4.8 3 1.4 0.1 Setosa
14 4.3 3 1.1 0.1 Setosa
15 5.8 4 1.2 0.2 Setosa
……
141 6.7 3.1 5.6 2.4 Virginica
142 6.9 3.1 5.1 2.3 Virginica
143 5.8 2.7 5.1 1.9 Virginica
144 6.8 3.2 5.9 2.3 Virginica
145 6.7 3.3 5.7 2.5 Virginica
146 6.7 3 5.2 2.3 Virginica
147 6.3 2.5 5 1.9 Virginica
148 6.5 3 5.2 2 Virginica
149 6.2 3.4 5.4 2.3 Virginica
150 5.9 3 5.1 1.8 Virginica

OUTPUT:-
:first 5 rows
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
3 4.6 3.1 1.5 0.2 Setosa
4 5.0 3.6 1.4 0.2 Setosa

rows and columns

(150, 5)

22
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Column and their datatypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal.length 150 non-null float64
1 sepal.width 150 non-null float64
2 petal.length 150 non-null float64
3 petal.width 150 non-null float64
4 variety 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

Statistical values

sepal.length sepal.width petal.length petal.width


count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

Checking duplicates

sepal.length sepal.width petal.length petal.width variety


0 5.1 3.5 1.4 0.2 Setosa
50 7.0 3.2 4.7 1.4 Versicolor
100 6.3 3.3 6.0 2.5 Virginica

variety counts

RESULT:-
Thus the above program exploratory descriptive analysis on Iris data set is verified.

23
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.1.a UNIVARIATE ANALYSIS


Date: PIMA INDIANS DIABETES FOR DIABETIC PATIENTS
AIM
To write a python program for univariate analysis using pima Indians Diabetes data set for diabetic
patients.
ALGORITHM :-
Step 1: Start the program
Step 2: import pandas packages
Step 3: Read data from pima Indian Diabetes.csv data set
Step 4: separate diabetic patients from the data set
Step 5:calculate frequency, mean, median, mode, standard deviation, variance, skewness , kurtosis by using in
built functions value_counts() ,mean(),median(),mode(),std(), var(), skew(), kurt() for independent fileds
pregnant, glucose, bp, skin, insulin, bmi, pedigree in the data set
Step 6: stop the program
PROGRAM:
importpandas as pd
df=pd. read_csv("pima1.csv")
df1=df[df.outcome==1]
print(df1['pregnant'].value_counts())
print(df1['pregnant'].mean())
print(df1['pregnant'].std())
print(df1['pregnant'].mode())
print(df1['pregnant'].var())
print(df1['pregnant'].skew())
print(df1['pregnant'].kurt())
print(df1['glucose'].value_counts())
print(df1['glucose'].mean())
print(df1['glucose'].std())
print(df1['glucose'].mode())
print(df1['glucose'].var())
print(df1['glucose'].skew())
print(df1['glucose'].kurt())
print(df1['bp'].value_counts())
print(df1['bp'].mean())
print(df1['bp'].std())
print(df1['bp'].mode())
print(df1['bp'].var())
print(df1['bp'].skew())
print(df1['bp'].kurt())
print(df1['skin'].value_counts())
print(df1['skin'].mean())
print(df1['skin'].std())
print(df1['skin'].mode())
print(df1['skin'].var())
print(df1['skin'].skew())
print(df1['skin'].kurt())
print(df1['insulin'].value_counts())
print(df1['insulin'].mean())
print(df1['insulin'].std())
print(df1['insulin'].mode())
print(df1['insulin'].var())

24
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print(df1['insulin'].skew())
print(df1['insulin'].kurt())
print(df1['bmi'].value_counts())
print(df1['bmi'].mean())
print(df1['bmi'].std())
print(df1['bmi'].mode())
print(df1['bmi'].var())
print(df1['bmi'].skew())
print(df1['bmi'].kurt())
print(df1['pedigree'].value_counts())
print(df1['pedigree'].mean())
print(df1['pedigree'].std())
print(df1['pedigree'].mode())
print(df1['pedigree'].var())
print(df1['pedigree'].skew())
print(df1['pedigree'].kurt())
print(df1['age'].value_counts())
print(df1['age'].mean())
print(df1['age'].std())
print(df1['age'].mode())
print(df1['age'].var())
print(df1['age'].skew())
print(df1['age'].kurt())

25
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Pima Indians Diabetics.csv


s.no pregnant glucose bp skin insulin bmi pedigree age class
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
7 3 78 50 32 88 31 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
10 8 125 96 0 0 0 0.232 54 1
11 4 110 92 0 0 37.6 0.191 30 0
12 10 168 74 0 0 38 0.537 34 1
13 10 139 80 0 0 27.1 1.441 57 0
14 1 189 60 23 846 30.1 0.398 59 1
15 5 166 72 19 175 25.8 0.587 51 1
16 7 100 0 0 0 30 0.484 32 1
17 0 118 84 47 230 45.8 0.551 31 1
18 7 107 74 0 0 29.6 0.254 31 1
19 1 103 30 38 83 43.3 0.183 33 0
20 1 115 70 30 96 34.6 0.529 32 1
………
761 2 88 58 26 16 28.4 0.766 22 0
762 9 170 74 31 0 44 0.403 43 1
763 9 89 62 0 0 22.5 0.142 33 0
764 10 101 76 48 180 32.9 0.171 63 0
765 2 122 70 27 0 36.8 0.34 27 0
766 5 121 72 23 112 26.2 0.245 30 0
767 1 126 60 0 0 30.1 0.349 47 1
768 1 93 70 31 0 30.4 0.315 23 0

OUTPUT:
8 2
6 1
0 1
3 1
2 1
10 1
1 1
5 1
Name: pregnant, dtype: int64
4.777777777777778
3.492054473292827

26
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

0 8
Name: pregnant, dtype: int64
12.194444444444445
0.07976827393633572
-1.4086477342894645
148 1
183 1
137 1
78 1
197 1
125 1
168 1
189 1
166 1
Name: glucose, dtype: int64
154.55555555555554
37.4069215223303
0 78
1 125
2 137
3 148
4 166
5 168
6 183
7 189
8 197
Name: glucose, dtype: int64
1399.2777777777776
-1.0313853707975722
0.978100480723807
72 2
64 1
40 1
50 1
70 1
96 1
74 1
60 1
Name: bp, dtype: int64
66.44444444444444
15.898986690282428
0 72
Name: bp, dtype: int64
252.77777777777777
0.13656074039998925
1.0136266496454889
0 3
35 2
32 1
45 1
23 1

27
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

19 1
Name: skin, dtype: int64
21.0
17.392527130926087
0 0
Name: skin, dtype: int64
302.5
-0.21810449977720303
-1.6245455521188052
0 4
168 1
88 1
543 1
846 1
175 1
Name: insulin, dtype: int64
202.22222222222223
297.7233521987223
0 0
Name: insulin, dtype: int64
88639.19444444445
1.6550059400947466
1.9781799097232406
33.6 1
23.3 1
43.1 1
31.0 1
30.5 1
0.0 1
38.0 1
30.1 1
25.8 1
Name: bmi, dtype: int64
28.37777777777778
12.189521912053994
0 0.0
1 23.3
2 25.8
3 30.1
4 30.5
5 31.0
6 33.6
7 38.0
8 43.1
Name: bmi, dtype: float64
148.58444444444447
-1.6632175863057026
3.989200103644359
0.627 1
0.672 1
2.288 1

28
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

0.248 1
0.158 1
0.232 1
0.537 1
0.398 1
0.587 1
Name: pedigree, dtype: int64
0.6385555555555555
0.6462886566989844
0 0.158
1 0.232
2 0.248
3 0.398
4 0.537
5 0.587
6 0.627
7 0.672
8 2.288
Name: pedigree, dtype: float64
0.41768902777777767
2.521186410463069
6.957869024161264
50 1
32 1
33 1
26 1
53 1
54 1
34 1
59 1
51 1
Name: age, dtype: int64
43.55555555555556
12.135805608931685
0 26
1 32
2 33
3 34
4 50
5 51
6 53
7 54
8 59
Name: age, dtype: int64
147.27777777777774
-0.23884551536673332
-1.9149668580541754

RESULT:-
Thus the above program univariant analysis on pima Indian diabetes for diabetics patients is verified

29
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.1.b UNIVARIATE ANALYSIS


Date: PIMA INDIAN DIABETES FOR NON DIABETIC PATIENTS
AIM
To write a python program for univariate analysis using Pima Indians Diabetes data set for non diabetic
patients.
ALGORITHM :-
Step 1: Start the program
Step 2 :import pandas packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate non diabetic patients from the data set
Step 5:calculate frequency,mean,median,mode,standard deviation,variance,skewness,kurtosis by using in built
functions value_counts(),mean(),median(),mode(),std(),var(),skew(),kurt() for independent fileds
pregnant,glucose,bp,skin,insulin,bmi,pedigree in the data set
Step 6:stop the program
PROGRAM:
import pandas as pd
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==0]
print(df1['pregnant'].value_counts())
print(df1['pregnant'].mean())
print(df1['pregnant'].median())
print(df1['pregnant'].std())
print(df1['pregnant'].mode())
print(df1['pregnant'].var())
print(df1['pregnant'].skew())
print(df1['pregnant'].kurt())
print(df1['glucose'].value_counts())
print(df1['glucose'].mean())
print(df1['glucose'].median())
print(df1['glucose'].std())
print(df1['glucose'].mode())
print(df1['glucose'].var())
print(df1['glucose'].skew())
print(df1['glucose'].kurt())
print(df1['bp'].value_counts())
print(df1['bp'].mean())
print(df1['bp'].median())
print(df1['bp'].std())
print(df1['bp'].mode())
print(df1['bp'].var())
print(df1['bp'].skew())
print(df1['bp'].kurt())
print(df1['skin'].value_counts())
print(df1['skin'].mean())
print(df1['skin'].median())
print(df1['skin'].std())
print(df1['skin'].mode())
print(df1['skin'].var())
print(df1['skin'].skew())
print(df1['skin'].kurt())
print(df1['insulin'].value_counts())

30
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print(df1['insulin'].mean())
print(df1['insulin'].median())
print(df1['insulin'].std())
print(df1['insulin'].mode())
print(df1['insulin'].var())
print(df1['insulin'].skew())
print(df1['insulin'].kurt())
print(df1['bmi'].value_counts())
print(df1['bmi'].mean())
print(df1['bmi'].median())
print(df1['bmi'].std())
print(df1['bmi'].mode())
print(df1['bmi'].var())
print(df1['bmi'].skew())
print(df1['bmi'].kurt())
print(df1['pedigree'].value_counts())
print(df1['pedigree'].mean())
print(df1['pedigree'].median())
print(df1['pedigree'].std())
print(df1['pedigree'].mode())
print(df1['pedigree'].var())
print(df1['pedigree'].skew())
print(df1['pedigree'].kurt())
print(df1['age'].value_counts())
print(df1['age'].mean())
print(df1['age'].median())
print(df1['age'].std())
print(df1['age'].mode())
print(df1['age'].var())
print(df1['age'].skew())
print(df1['age'].kurt())

31
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Pima Indians Diabetics.csv


s.no pregnant glucose bp skin insulin bmi pedigree age class
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
7 3 78 50 32 88 31 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
10 8 125 96 0 0 0 0.232 54 1
11 4 110 92 0 0 37.6 0.191 30 0
12 10 168 74 0 0 38 0.537 34 1
13 10 139 80 0 0 27.1 1.441 57 0
14 1 189 60 23 846 30.1 0.398 59 1
15 5 166 72 19 175 25.8 0.587 51 1
16 7 100 0 0 0 30 0.484 32 1
17 0 118 84 47 230 45.8 0.551 31 1
18 7 107 74 0 0 29.6 0.254 31 1
19 1 103 30 38 83 43.3 0.183 33 0
20 1 115 70 30 96 34.6 0.529 32 1
………
761 2 88 58 26 16 28.4 0.766 22 0
762 9 170 74 31 0 44 0.403 43 1
763 9 89 62 0 0 22.5 0.142 33 0
764 10 101 76 48 180 32.9 0.171 63 0
765 2 122 70 27 0 36.8 0.34 27 0
766 5 121 72 23 112 26.2 0.245 30 0
767 1 126 60 0 0 30.1 0.349 47 1
768 1 93 70 31 0 30.4 0.315 23 0

OUTPUT:
1 2
10 2
5 1
4 1
Name: pregnant, dtype: int64
5.166666666666667
4.5
4.070217029430577
0 1
1 10
Name: pregnant, dtype: int64

32
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

16.56666666666667
0.3539476771784376
-1.9239379941621584
85 1
89 1
116 1
115 1
110 1
139 1
Name: glucose, dtype: int64
109.0
112.5
19.809088823063014
0 85
1 89
2 110
3 115
4 116
5 139
Name: glucose, dtype: int64
392.4
0.221379243643542
-0.31517019081197173
66 2
74 1
0 1
92 1
80 1
Name: bp, dtype: int64
63.0
70.0
32.36664950222682
0 66
Name: bp, dtype: int64
1047.6
-1.9408208876079585
4.311601666734459
0 4
29 1
23 1
Name: skin, dtype: int64
8.666666666666666
0.0
13.559744343705992
0 0
Name: skin, dtype: int64
183.86666666666665
1.052575991628368
-1.3694264585166165
0 5
94 1

33
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Name: insulin, dtype: int64


15.666666666666666
0.0
38.37533930360312
0 0
Name: insulin, dtype: int64
1472.6666666666665
2.4494897427831783
6.0
26.6 1
28.1 1
25.6 1
35.3 1
37.6 1
27.1 1
Name: bmi, dtype: int64
30.05
27.6
5.074938423271753
0 25.6
1 26.6
2 27.1
3 28.1
4 35.3
5 37.6
Name: bmi, dtype: float64
25.75499999999999
0.9474768598128397
-1.3608302417826117
0.351 1
0.167 1
0.201 1
0.134 1
0.191 1
1.441 1
Name: pedigree, dtype: int64
0.41416666666666674
0.196
0.5085675635219638
0 0.134
1 0.167
2 0.191
3 0.201
4 0.351
5 1.441
Name: pedigree, dtype: float64
0.25864096666666664
2.336696757512407
5.534561849601891
30 2
31 1

34
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

21 1
29 1
57 1
Name: age, dtype: int64
33.0
30.0
12.312595177297109
0 30
Name: age, dtype: int64
151.6
1.923829603041346
4.4999860763988

RESULT:-
Thus the above program univariant analysis on Pima Indian Diabetes for non Diabetic Patients is verified.

35
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.2.a BIVARIATE ANALYSIS - LINEAR REGRESSION


Date: PIMA INDIAN DIABETES FOR DIABETIC PATIENTS

AIM
To write a python program for linear regression using pima Indians Diabetes data set for diabetic
patients.
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and matplotlib.pyplot,scipy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate diabetic patients from the data set
Step 5:get x,y values from data set for linregress()
Step 6:Plot x,y values using scatterplot
Step 7: plot regression line by calculate y=bx+a
Step 8:stop the program
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==1]
x=df1["age"]
y=df1["bp"]
slope,intercept,r,p,std_err=stats.linregress(x,y)
defmyfunc(x):
return slope*x+intercept
mymodel=list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()

36
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Bivariant Analysis on Linear Regression usingPima Indians Diabetes for
Diabetic Patients is verified.

37
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.2.b BIVARIATE ANALYSIS - LINEAR REGRESSION


Date: PIMA INDIAN DIABETES FOR NON -DIABETIC PATIENTS

AIM
To write a python program for linear regression using pima Indians Diabetes data set for non diabetic
patients.
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and matplotlib.pyplot,scipy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate non diabetic patients from the data set
Step 5:get x,y values from data set for linregress()
Step 6:Plot x,y values using scatterplot
Step 7: plot regression line by calculate y=bx+a
Step 8:stop the program
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==0]
x=df1["age"]
y=df1["bp"]
slope,intercept,r,p,std_err=stats.linregress(x,y)
defmyfunc(x):
return slope*x+intercept
mymodel=list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()

38
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Bivariant Analysis on Linear Regression using Pima Indians Diabetes for Non
Diabetic Patients is verified.

39
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.3 BIVARIATE ANALYSES - LOGISTIC REGRESSION


Date: PIMA INDIAN DIABETES
AIM
To write a python program for logistic regression using pima Indians Diabetes data set
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and numpy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4: get x,y values from data set for model.LogisticRegression()
Step 5: predict the y for the given x values based on pima Indian diabetes data set.
Step 6:stop the program
PROGRAM:
Import numpy as np
import pandas as pd
fromsklearn import linear_model
df=pd.read_csv("pima1.csv")
X=df[["bp"]]
y=df[["outcome"]]
logr=linear_model.LogisticRegression()
logr.fit(X.values,y.values.ravel())
predicted=logr.predict(np.array([66]).reshape(1,-1))
print(predicted)
OUTPUT:
[1]

RESULT:
Thus the above program Bivariate Analysis on Logistic Regression using Pima Indians Diabetes set is
verified.

40
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.4 MULTIPLE REGRESSION ANALYSIS


Date: PIMA INDIAN DIABETES

AIM
To write a python program for multiple regression analysis using pima Indians Diabetes data set
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and liner_model from sklearn packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4: get x as independent , y as dependent values from data set for linear_model_LinearRegression()
Step 5:predict the y for the given x values based on pima Indian diabetes data set.
Step 6:stop the program
PROGRAM:
import pandas as pd
from sklearn import linear_model
df=pd.read_csv("pima1.csv")
X=df[['age','bp']]
y=df['outcome']
regr=linear_model.LinearRegression()
regr.fit(X.values,y)
predicteddia=regr.predict([[62.11,63]])
print(predicteddia)
OUTPUT:
[1.00002405]

RESULT:
Thus the above program multiple regression Analysis using Pima Indians Diabetes set is verified.

41
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.5 RESULT ANALYSIS


Date:

AIM:
To perform result analysis for 5.1.a,5.1.b,5.2.a and 5.2.b using pima Indians diabetes data set.
ANALYSIS
5.1.a vs 5.1.b
category function Diabetes Non-Diabetes
8 2
6 1
0 1
3 1 1 2
2 1 10 2
10 1 5 1
1 1 4 1
frequency 5 1
pregnant
mean 4.777777777777778 5.166666666666667
standard deviation 3.492054473292827 4.070217029430577
0 1
mode 0 8 1 10
variance 12.194444444444445 16.56666666666667
skewness 0.07976827393633572 0.3539476771784376
kurtosis -1.4086477342894645 -1.9239379941621584
148 1
183 1
137 1
78 1 85 1
197 1 89 1
125 1 116 1
168 1 115 1
189 1 110 1
166 1 139 1
frequency
mean 154.55555555555554 109.0
standard deviation 37.4069215223303 19.809088823063014
glucose 0 78
1 125
2 137 0 85
3 148 1 89
4 166 2 110
5 168 3 115
6 183 4 116
7 189 5 139
mode 8 197
variance 1399.2777777777776 392.4
skewness -1.0313853707975722 0.221379243643542
kurtosis 0.978100480723807 -0.31517019081197173

42
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

72 2
64 1
40 1 66 2
50 1 74 1
70 1 0 1
96 1 92 1
74 1 80 1
bp frequency 60 1
mean 66.44444444444444 63.0
standard deviation 15.898986690282428 32.36664950222682
mode 0 72 0 66
variance 252.77777777777777 1047.6
skewness 0.13656074039998925 -1.9408208876079585
kurtosis 1.0136266496454889 4.311601666734459
0 3
35 2
32 1 0 4
45 1 29 1
23 1 23 1
frequency 19 1
skin mean 21.0 8.666666666666666
standard deviation 17.392527130926087 13.559744343705992
mode 0 0 0 0
variance 302.5 183.86666666666665
skewness -0.21810449977720303 1.052575991628368
kurtosis -1.6245455521188052 -1.3694264585166165
0 4
168 1
88 1
543 1 0 5
846 1 94 1
frequency 175 1
insulin mean 202.22222222222223 15.666666666666666
standard deviation 297.7233521987223 38.37533930360312
mode 0 0 0 0
variance 88639.19444444445 1472.6666666666665
skewness 1.6550059400947466 2.4494897427831783
kurtosis 1.9781799097232406 6.0
33.6 1
23.3 1 26.6 1
43.1 1 28.1 1
31.0 1 25.6 1
bmi 35.3 1
30.5 1
0.0 1 37.6 1
38.0 1 27.1 1
frequency 30.1 1

43
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

25.8 1
mean 28.37777777777778 30.05
standard deviation 12.189521912053994 5.074938423271753
0 0.0
1 23.3
2 25.8 0 25.6
3 30.1 1 26.6
4 30.5 2 27.1
5 31.0 3 28.1
6 33.6 4 35.3
7 38.0 5 37.6
mode 8 43.1
variance 148.58444444444447 25.75499999999999
skewness -1.6632175863057026 0.9474768598128397
kurtosis 3.989200103644359 -1.3608302417826117
0.627 1
0.672 1
2.288 1 0.351 1
0.248 1 0.167 1
0.158 1 0.201 1
0.232 1 0.134 1
0.537 1 0.191 1
0.398 1 1.441 1
frequency 0.587 1
mean 0.6385555555555555 0.41416666666666674
standard deviation 0.6462886566989844 0.5085675635219638
pedigree 0 0.158
1 0.232
2 0.248 0 0.134
3 0.398 1 0.167
4 0.537 2 0.191
5 0.587 3 0.201
6 0.627 4 0.351
7 0.672 5 1.441
mode 8 2.288
variance 0.41768902777777767 0.25864096666666664
skewness 2.521186410463069 2.336696757512407
kurtosis 6.957869024161264 5.534561849601891
50 1
32 1
33 1
26 1 30 2
age 53 1 31 1
54 1 21 1
34 1 29 1
59 1 57 1
frequency 51 1

44
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

mean 43.55555555555556 33.0


standard deviation 12.135805608931685 12.312595177297109
0 26
1 32
2 33
3 34
4 50
5 51
6 53
7 54 0 30
mode 8 59
variance 147.27777777777774 151.6
skewness -0.23884551536673332 1.923829603041346
kurtosis -1.9149668580541754 4.4999860763988

45
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

5.2.a vs 5.2.b

5.2.a

5.2.b

RESULT:-
Thus the above program 5.1.a, 5.1.b,5.2.a and 5.2.b results are analyzed.

46
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.no:6.1 NORMAL CURVES –IRIS DATA SET


Date:

AIM
To write a python program for Normal curves using Iris data set

ALGORITHM:-
Step 1: Start the program
Step 2 :import pandas, matplotlin.pyplot packages
Step 3:Read data from Iris.csv data set
Step 4: get x,y from the data set
Step 5:Plot dotted line curve by using x,y values
Step 6:stop the program

PROGRAM:-
import pandas as pd
importmatplotlib.pyplot as plt
df=pd.read_csv("irisdata.csv")
x=df['sepal.length']
y=df['petal.length']
plt.plot(x,y, linestyle = 'dotted')
plt.show()

47
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

RESULT:-

Thus the above program Normal Curves using Iris Data Set is verified.

48
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.no:6.2 DENSITY AND CONTOUR PLOTS -


Date: PIMA INDIANS DIABETES DATA SET
AIM:
To write a python program for density and contour plots using Pima Indians diabetes data set
ALGORITHM :
Step 1: Start the program
Step 2 :import Numpy, pandas, matplotlin.pyplot packages
Step 3:Read data from pima Indians diabetes.csv data set
Step 4:Get x,y values from data set
Step 5:Generate combinations of gri ds by using meshgrid()
Step 6: Define a function func () for calculating third value z
Step7: draw rectangular contour plot by using contour()
Step 8:stop the program
PROGRAM:-
importnumpy as np
import pandas as pd
importmatplotlib.pyplot as plt
df=pd.read_csv("pima Indian Diabetes.csv")
# define a function
deffunc(x,y):
returnnp.sin(x)** 2 + np.cos(y)**2
# getx,y
x=df['bp']
y=df['glucose']
# Generate combination of grids
X,Y = np.meshgrid(x,y)
Z = func(X,Y)
# Draw rectangular contour plot
plt.contour(X,Y,Z,cmap='gist_rainbow_r')
plt.show()

49
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

RESULT:-
Thus the above program Density and Control Plots using Pima Indians diabetes Set is verified.

50
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.3.a CORRELATION – IRIS DATA SET


Date:

AIM:
To write a python program for Correlation using Iris data Set.
ALGORITHM:
1. Start the program.
2. Import pandas and matplotlib pyplot package.
3. Read iris.Csv file.
4.Calculate containing correlation co-efficient of r by using (method=”person”) defined on pandas package.
5. Coefficient of r for each column by using correlation.
6.Stop the program.
PROGRAM:-
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
x=df.corr(method='pearson')
print(x)

OUTPUT:
sepal.lengthsepal.widthpetal.lengthpetal.width
sepal.length 1.000000 -0.117570 0.871754 0.817941
sepal.width -0.117570 1.000000 -0.428440 -0.366126
petal.length 0.871754 -0.428440 1.000000 0.962865
petal.width 0.817941 -0.366126 0.962865 1.000000

RESULT:
Thus the above program Correlation using Iris Data Set is verified.

51
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.3.b SCATTER PLOT- IRIS DATA SET


Date:
AIM:
To write the python program for Scatter Plot using Iris Data Set .
ALGORITHM:
1. Start the program.
2.import pandas and matplotlib.pyplot packages.
3.Read iris.csv file
4.Draw the scatter plot for 3 different varieties of flowers using their sepal length and petal length
5.Stop the program

PROGRAM:-
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
df1=df[df.variety=='Setosa']
x1=df1['sepal.length']
y1=df1['petal.length']
df2=df[df.variety=='Versicolor']
x2=df2['sepal.length']
y2=df2['petal.length']
df3=df[df.variety=='Virginica']
x3=df3['sepal.length']
y3=df3['petal.length']
plt.scatter(x1,y1,color='red',marker='o',label='Setosa')
plt.scatter(x2,y2,color='blue',marker='s',label='Versicolor')
plt.scatter(x3,y3,color='green',marker='x',label='Virginica')
plt.title('iris_scatterplot')
plt.xlabel("sepal.length[cm]")
plt.ylabel("petal.length[cm]")
plt.show()

52
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Scatter Plot using Iris Data Set is verified.

53
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.4 HISTOGRAM – IRIS DATA SET


Date:

AIM:
To write the python program for Histogram using Iris Data set.

ALGORITHM:
1. Start the program.
2. Import pandas and matplotlib-pyplot package.
3. Read Iris_csv file.
4. Draw the Histogram using hist() for sepal.length as X-axis.
5. Draw the histogram using hist() for sepal.width as X-axis.
6. Draw the histogram using hist() for petal.length as x-axis.
7. Draw the histogram usingn hist() for petal.width as x-axis.
8. Stop the program.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
x1=df["sepal.width"]
plt.subplot(2,2,1)
plt.hist(x1,bins=20,color="green")
plt.title("sepal width in cm")
plt.xlabel("sepal width cm")
plt.ylabel("count")
x2=df["sepal.length"]
plt.subplot(2,2,2)
plt.hist(x2,bins=20,color="blue")
plt.title("sepal length in cm")
plt.xlabel("sepal length cm")
plt.ylabel("count")
x3=df["petal.length"]
plt.subplot(2,2,3)
plt.hist(x3,bins=20,color="red")
plt.title("petal length in cm")
plt.xlabel("petal length cm")
plt.ylabel("count")
x4=df["petal.width"]
plt.subplot(2,2,4)
plt.hist(x4,bins=20,color="yellow")
plt.title("petal width in cm")
plt.xlabel("petal width cm")
plt.ylabel("count")
plt.show()

54
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Histogram using Iris Data Set is verified.

55
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.5 THREE DIMENSIONAL PLOTTING –


Date: PIMA INDIANS DIABETES SET

AIM:
To write the python program for three dimensional plotting using Pima Indians Diabetes Data set.

ALGORITHM:
1. Start the program.
2. Import mplot3d from mpl_tiilkits ,numpy ,pandas and matplotlib.pyplot packages.
3. Read pima1.csv file.
4. Read x and y.
5. Grid is generated for X and Y axis using meshgrid().
6.Z axis is generated by using f(x,y).
7. Draw the wireframe by using plot_wireframe().
8. Stop the program.

PROGRAM:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df=pd.read_csv("pima1.csv")
#function for z area
def f(x,y):
return np.sin(np.sqrt(x**2+y**2))
#x and y axis
x=df['age']
y=df['bp']
X,Y=np.meshgrid(x,y)
Z=f(X,Y)
fig=plt.figure()
ax=plt.axes(projection='3d')
ax.plot_wireframe(X,Y,Z,color='green')
ax.set_title('wireframe geeks for geeks')
plt.show()

56
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program three dimensional plotting using Pima Indians Diabetes Set is verified.

57
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP


Date:

AIM:
To write the python program for Visualizing Geographic Data with Basemap
ALGORITHM:
1. Start the program.
2. Import matplotlib.pyplot and Basemap from mpl_toolkits.basemap package.
3..Draw figure window using figure()
4. Draw the coastline using drawcoastlines()
5. Draw the countries using drawcountries().
6. Fill the continents using fillcontinents().
7. Draw the boundaries using drawmapboundary().
8. Stop the program.
PROGRAM:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(12,12))
m=Basemap()
m.drawcoastlines(linewidth=1.0,linestyle='solid',color='black')
m.drawcountries(linewidth=1.0,linestyle='solid',color='k')
m.fillcontinents(color='coral',lake_color='aqua')
m.drawmapboundary(color='b',linewidth=2.0,fill_color='aqua')
plt.title("Filled map boundary",fontsize=20)
plt.show()

OUTPUT:

RESULT:-
Thus the above program for Visualizing Geographic Data with Basemap is verified.

58
Department of CSE & IT

You might also like