0% found this document useful (0 votes)

76 views

CS3361-Data Science Laboratory Manual

datascience ppt

Uploaded by

anithakannan2209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

CS3361-Data Science Laboratory Manual

datascience ppt

Uploaded by

anithakannan2209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Salem college of Engineering and Technology Data Science Laboratory

Salem College
of Engineering and Technology
NH-68, Salem-Attur Main Road, Mettupatty, Perumapalayam, Selliamman
Nagar, Salem, Tamil Nadu 636111

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERIRNG

AND
DEPARTMENT OF INFORMATION TECHNOLOGY

ACADEMIC YEAR : 2023-2024

Lab Manual

CS3361- Data Science Laboratory

(III semester)
Regulation 2021

Prepared By

Ms.S.Namagiri, AP/CSE

Ms.A.Merlin, AP/CSE
1
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

CS3361 DATA SCIENCE LABORATORY LTPC

0 0 4 2 COURSE
OBJECTIVES:
 To understand the python libraries for data science
 To understand the basic Statistical and Probability measures for data science.
 To learn descriptive analytics on the benchmark data sets.
 To apply correlation and regression analytics on standard data sets.
 To present and interpret data using visualization packages in Python.
LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:
a.Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a.Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
LIST OF EQUIPMENTS :(30 Students per Batch)

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

2
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

LIST OF EXPERIMENTS

Ex.No Name of the Experiments

Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
1
packages
2 Working with Numpy arrays

3 Working with Pandas data frames

4a Reading data from text files, Excel and the web

4b Descriptive analytics on the Iris data set

5.1.a Univariate Analysis - Pima Indians Diabetes for Diabetic Patients

5.1.b Univariate Analysis - Pima Indians Diabetes for Non Diabetic Patients

5.2.a Bivariate Analysis - Linear Regression -Pima Indians Diabetes for Diabetic Patients

5.2b Bivariate Analysis - Linear Regression -Pima Indians Diabetes for Non Diabetic Patients

5.3 Bivariate Analysis -Logistic regression -Pima Indians Diabetes

5.4 Multiple Regression analysis -Pima Indians Diabetes

5.5 Result Analysis

6.1 Normal curves - Iris data set

6.2 Density and contour plots - Pima Indians Diabetes data set

6.3.a Correlation-Iris data set

6.3.b Scatter plot - Iris data set

6.4 Histogram -Iris data set

6.5 Three dimensional plotting

7 Visualizing Geographic Data with Basemap

3
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Exno: 1 DOWNLOAD INSTALL AND EXPLORE THE FEATUE

Date: NUMY,SCIPY,JUPYER,STATS AND MODELS AND PANDAS PACKAGES

AIM:-
To download, install and explore the features of numpy ,scipy, jupyter, stats models and pandas packages .
DESCRIPTION:-
1 .Download the package of numpy, scipy, jupyter, stats model and pandas packages download numpy
from web.
2. Check the python version before install numpy because most of as having pre installed.
Check the version of python 3, run the command python_ 3v.
3. Install the package in windows. Install pip, which is a package manager for installing and managing
python software packages. The easiest way t installs numpy is by using pip.
4. Install numpy with pp set up we can use its command line for installing numpy with python by tying.
FEATUES:-
1 .NUMPY
 Numpy stands for numerical python.
 Numpy is one of the most commonly used packages for specifying computing in python.
 High performance zero-dimensional array object
 It contains tools for integrating code from C/C++ &Fort on .
 It contains an multidimensional contains for generic data.
 Additional linear algebra, Fouriertransforms and random number capabilities.
 It consists of broad casting functions.

2. SCIPY:
 Scipy stands for scientific python
 Scipy is a scientific computation library that uses numpy underneath.
 It provides more utility functions for optimization, stats and signal processing.
 Scipy is a python library that is useful in solving many mathematical equations and algorithm.
 It is used on the top of numpy library that gives more extension of finding scientificalmathematical
formulae like matrix rule, inverse, polynomial equations, LU Decomposition, etc,…..

4
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

3. JUPYTER:-
 Jupyter is a loose acronym meaning Julie, python, and R.
 These programming languages were the first target languages of the jupyter application.
 The main components of the whole environment are on the end pne hand, the note books
themselves and application,
 Jupyter notebook is an open-source, web-based interactive environment, which allows you to
create and share documents that contain live code, mathematical equations, graphics , maps, plots,
visualizations and narrative text.

5
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

4. STATS MODELS:-
 Stats models stands for statistical models.Stats models is a python package that allows users
to explore data, estimate statisticalmodels, and perform statistical tests.
 An extensive list of descriptive statistics, statistical tests, plotting functions and result
statistics are available for different types of data and each estimator.

6
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

7
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

5. PANDAS:-
 Pandas stands for “Python Analysis Library”.
 Fast and efficient Data frame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and evaluating of data sets.

8
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

RESULT:-
Thus the above packages downloaded, installed and features are studied.

9
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:2 WORKING WITH NUMPY ARRAY

Date:

AIM:-

To write a python program using numpy array.

ALGORITHM:-

1. Import numpy package.

2. Create a array from list with type float using array() and print the same.

3. Create a array from tuple using array () and print the same.

4. Creating a 3*4 array with all zeroes () and print the same.

5. Create a withstand value array of complex type using full () and print the same.

6. Create a sequences of integers with steps from 0 to 30 using array () and print the same.

7. Reshaping 3*4 array to 2*2*3 array reshape () and print the same.

8. Create a flatten array using flatten () and print the same.

9. Create a merge of arrays using concatenate () and print the same.

10. Create a merge splitten array using array split () and print the same.

11. To sort a given array using sort () and print the same.

12. To search the given key value in the given array by using where() and print the same.

PROGRAM :

import numpy as np

#1.Creating array from list with type float

a=np. Array([[1,2,4],[5,8,7]],dtype='float')

print ("\n 1.Array created using passed list:\n",a)

#2.Creating array from tuple

b=np.array((1,3,2))

print("\n 2.Array created using passed tuple:\n",b)

10
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

#3.Creating a 3*4 array with all zeros

c=np.zeros((3,4))

print("\n 3.An array initialized with all zeros:\n",c)

#4.Create a constant value array of complex type

d=np.full((3,3),6,dtype='complex')
print("\n 4.An array initialized with all 6's with type complex:\n",d)
#5.Create a sequence of integers from 0 to 30 with steps of 5
e=np.arange(0,30,5)
print("\n 5.A sequence array with steps of 5:\n",e)
#6.Reshaping 3*4 array to 2*2*3 array
f=np.array([[1,2,3,4],[5,2,4,2],[1,2,0,1]])
fnew=f.reshape(2,2,3)
print("\n 6.Old array\n",f)
print("\n New array\n",fnew)
#7.Flatten array
g=np.array([[1,2,3],[4,5,6]])
gflat=g.flatten()
print("\n 7.Old array\n",g)
print("\n Flatten array\n",gflat)
#8.Join array
arr1=np.array([1,2,3])
arr2=np.array([4,5,6])
arr=np.concatenate((arr1,arr2))
print("\n 8.Joined array\n",arr)
#9.Split
h=np.array([1,2,3,4,5,6])
hnew=np.array_split(h,3)
print("\n 9.Splited array\n",hnew)
#10.Sort
i=np.array([3,2,0,1])
print("\n 10.Sorted 1D array:\n",np.sort(i))
j=np.array([[3,2,4],[5,0,1]])
11
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print("\n 10.Sorted 2D array:\n",np.sort(j))

#11.Search
k=np.array([1,2,3,4,5,4,4])
knew=np.where(k==4)
print("\n 11.Searched array position:\n",knew)

OUTPUT :
1.Array created using passed list:
[[1. 2. 4.]
[5. 8. 7.]]

2.Array created using passed tuple:

[1 3 2]

3. An array initialized with all zeros:

[[0.0. 0.0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
4.An array initialized with all 6's with type complex:
[[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]]
5.A sequence array with steps of 5:
[ 0 5 10 15 20 25]
6.Old array
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
New array
[[[1 2 3]

12
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

[4 5 2]]
[[4 2 1]
[2 0 1]]]
7.Old array
[[1 2 3]
[4 5 6]]
Flatten array
[1 2 3 4 5 6]
8.Joined array
[1 2 3 4 5 6]
9. Spitted array
[array([1, 2]), array([3, 4]), array([5, 6])]
10.Sorted 1D array:
[0 1 2 3]
10.Sorted 2D array:
[[2 3 4]
[0 1 5]]
11.Searched array position:
(array([3, 5, 6], dtype=int64),)

RESULT:-
Thus the above program is executed successfully .

13
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:3 WORKING WITH PANDAS DATA FRAMES

Date:

AIM:-

To write a python program using Pandas data frames.

ALGORITHM:-

1.Start the program.

2.Import pandas package.

3.Create a two different dictionaries for employee data.

4.Convert the dictionaries into data frames.

5.Using concat() method to join the two data frames into single data frame .

6.Stop the program.

PROGRAM:

import pandas as pd

#Define a dictionary containing employee data

data1={'Name':['Jai','Princi','Gaurav','Anuj'],

'Age':[27,24,22,32],

'Address':['Nagpur','Kanpur','Allahabad','Kannuaj'],

'Qualification':['M.sc','MA','MCA','P.hd']}

#Define a dictionary containing employee data

data2={'Name':['Abhi','Ayushi','Dhiraj','Hitesh'],

'Age':[17,14,12,52],

'Address':['Nagpur','Kanpur','Allahabad','Kannuaj'],

'Qualification':['B.tech','BA','B.com','B.hons']}

#Convert the dictionary into Data frame

14
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

df=pd.DataFrame(data1, index=[0,1,2,3])

#Convert the dictionary into Dataframe

df1=pd.DataFrame(data2,index=[4,5,6,7])

print ("DataFrame1\n\n",df,"\n\n","DataFrame2\n\n",df1,"\n\n")

#Using a concat() method

frames=[df,df1]

res1=pd.concat(frames)

print ("The concatenating Dataframes 1 and 2 \n\n",res1)

OUTPUT:

DataFrame 1

Name Age Address Qualification

0 Jai 27 Nagpur M.sc

1 Princi 24 Kanpur MA

2 Gaurav 22 Allahabad MCA

3 Anuj 32 Kannuaj P.hd

DataFrame 2

Name Age Address Qualification

4 Abhi 17 Nagpur B.Tech

5 Ayushi 14 Kanpur BA

6 Dhiraj 12 Allahabad B.com

7 Hitesh 52 Kannuaj B.hons

The concatenating Dataframes 1 and 2

Name Age Address Qualification

0 Jai 27 Nagpur M.sc

1 Princi 24 Kanpur MA
15
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

2 Gaurav 22 Allahabad MCA

3 Anuj 32 Kannuaj P.hd

4 Abhi 17 Nagpur B.Tech

5 Ayushi 14 Kanpur BA

6 Dhiraj 12 Allahabad B.com

7 Hitesh 52 Kannuaj B.hons

RESULT:-

Thus the above program is executed successfully.

16
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Exno: 4a READING DATA FROM TEXT FILE, EXCEL AND WEB

Date:

AIM:-

To write a python program for reading data from text file, excel file and web.

ALGORITHM:-

1. Start the program.

2. Import pandas package.

3. Create hello.txt test.xlsx files and

4. Save to the same directory.

5. Create hello1.txt and save to the ‘D’ directory.

6. Using open(),read_excel(),read() method to read the corresponding files.

7. Stop the program.

PROGRAM :

# Reading text files

import pandas as pd

f=open("file.txt","r")

print("Reading text file")

print (f.read())

#Reading excel files

df=pd.read_excel('test.xlsx',sheet_name='Employee')

print("Reading excel files")

print (df)

17
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

#Reading file from web or other source

f=open("D:\\cse\test.txt","r")

print("Reading file from web or another location")

print(f.read())

INPUT:-

18
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

19
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

Reading text file

Hii ! How ar Reading text file

Hii ! How are you ?

Reading excel files

S.no Name

0 1 Steve

1 2 Andrea

2 3 Mike

3 4 John

4 5 Max

5 6 Emma

Reading file from web or another location

HI! WELCOME TO FODS LAB

RESULT:-

Thus the above program is executed successfully.

20
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex No:4b EXPLORATORY DESCRIPTIVE ANALYSIS ON IRIS DATA SET

Date:
AIM:-
To write a python program for the exploratory descriptive analysis on Iris data set.
ALGORITHM:-
1. Start the program
2. Import pandas package.
3. Read Iris.csv file.
4. Perform Exploratory Descriptive Analysis on the iris data get by using
head()
shape()
info()
describe()
value_counts
drop_duplicate() methods
5. Stop the program.
PROGRAM:-
import pandas as pd
#reading Iris File
df=pd.read_csv(“Irisdata.csv”)

#1.print top five rows

print(“\nfirst 5 rows\n”)
print(df.head())

#2.print number of rows and columns

print(“\n rows and columns\n”)
print(df.shape)

#3.print column and their datatypes

print(“\ncolumn and their datatypes\n”)
print(df.info())

#4.print statistical values

print(“\n statistical summary\n”)
print(df.describe())

#5.checking duplicates
print(“\n checking duplicates\n”)
data=df.drop_duplicates(subset=”variety”)
print(data)

#6.value counts
print(“\n variety counts \n”)
print(df.value_counts(“variety”))

21
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Iris data set:-

s.no sepal.length sepal.width petal.length petal.width variety
1 5.1 3.5 1.4 0.2 Setosa
2 4.9 3 1.4 0.2 Setosa
3 4.7 3.2 1.3 0.2 Setosa
4 4.6 3.1 1.5 0.2 Setosa
5 5 3.6 1.4 0.2 Setosa
6 5.4 3.9 1.7 0.4 Setosa
7 4.6 3.4 1.4 0.3 Setosa
8 5 3.4 1.5 0.2 Setosa
9 4.4 2.9 1.4 0.2 Setosa
10 4.9 3.1 1.5 0.1 Setosa
11 5.4 3.7 1.5 0.2 Setosa
12 4.8 3.4 1.6 0.2 Setosa
13 4.8 3 1.4 0.1 Setosa
14 4.3 3 1.1 0.1 Setosa
15 5.8 4 1.2 0.2 Setosa
……
141 6.7 3.1 5.6 2.4 Virginica
142 6.9 3.1 5.1 2.3 Virginica
143 5.8 2.7 5.1 1.9 Virginica
144 6.8 3.2 5.9 2.3 Virginica
145 6.7 3.3 5.7 2.5 Virginica
146 6.7 3 5.2 2.3 Virginica
147 6.3 2.5 5 1.9 Virginica
148 6.5 3 5.2 2 Virginica
149 6.2 3.4 5.4 2.3 Virginica
150 5.9 3 5.1 1.8 Virginica

OUTPUT:-
:first 5 rows
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
3 4.6 3.1 1.5 0.2 Setosa
4 5.0 3.6 1.4 0.2 Setosa

rows and columns

(150, 5)

22
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Column and their datatypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal.length 150 non-null float64
1 sepal.width 150 non-null float64
2 petal.length 150 non-null float64
3 petal.width 150 non-null float64
4 variety 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

Statistical values

sepal.length sepal.width petal.length petal.width

count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

Checking duplicates

sepal.length sepal.width petal.length petal.width variety

0 5.1 3.5 1.4 0.2 Setosa
50 7.0 3.2 4.7 1.4 Versicolor
100 6.3 3.3 6.0 2.5 Virginica

variety counts

RESULT:-
Thus the above program exploratory descriptive analysis on Iris data set is verified.

23
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.1.a UNIVARIATE ANALYSIS

Date: PIMA INDIANS DIABETES FOR DIABETIC PATIENTS
AIM
To write a python program for univariate analysis using pima Indians Diabetes data set for diabetic
patients.
ALGORITHM :-
Step 1: Start the program
Step 2: import pandas packages
Step 3: Read data from pima Indian Diabetes.csv data set
Step 4: separate diabetic patients from the data set
Step 5:calculate frequency, mean, median, mode, standard deviation, variance, skewness , kurtosis by using in
built functions value_counts() ,mean(),median(),mode(),std(), var(), skew(), kurt() for independent fileds
pregnant, glucose, bp, skin, insulin, bmi, pedigree in the data set
Step 6: stop the program
PROGRAM:
importpandas as pd
df=pd. read_csv("pima1.csv")
df1=df[df.outcome==1]
print(df1['pregnant'].value_counts())
print(df1['pregnant'].mean())
print(df1['pregnant'].std())
print(df1['pregnant'].mode())
print(df1['pregnant'].var())
print(df1['pregnant'].skew())
print(df1['pregnant'].kurt())
print(df1['glucose'].value_counts())
print(df1['glucose'].mean())
print(df1['glucose'].std())
print(df1['glucose'].mode())
print(df1['glucose'].var())
print(df1['glucose'].skew())
print(df1['glucose'].kurt())
print(df1['bp'].value_counts())
print(df1['bp'].mean())
print(df1['bp'].std())
print(df1['bp'].mode())
print(df1['bp'].var())
print(df1['bp'].skew())
print(df1['bp'].kurt())
print(df1['skin'].value_counts())
print(df1['skin'].mean())
print(df1['skin'].std())
print(df1['skin'].mode())
print(df1['skin'].var())
print(df1['skin'].skew())
print(df1['skin'].kurt())
print(df1['insulin'].value_counts())
print(df1['insulin'].mean())
print(df1['insulin'].std())
print(df1['insulin'].mode())
print(df1['insulin'].var())

24
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print(df1['insulin'].skew())
print(df1['insulin'].kurt())
print(df1['bmi'].value_counts())
print(df1['bmi'].mean())
print(df1['bmi'].std())
print(df1['bmi'].mode())
print(df1['bmi'].var())
print(df1['bmi'].skew())
print(df1['bmi'].kurt())
print(df1['pedigree'].value_counts())
print(df1['pedigree'].mean())
print(df1['pedigree'].std())
print(df1['pedigree'].mode())
print(df1['pedigree'].var())
print(df1['pedigree'].skew())
print(df1['pedigree'].kurt())
print(df1['age'].value_counts())
print(df1['age'].mean())
print(df1['age'].std())
print(df1['age'].mode())
print(df1['age'].var())
print(df1['age'].skew())
print(df1['age'].kurt())

25
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Pima Indians Diabetics.csv

s.no pregnant glucose bp skin insulin bmi pedigree age class
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
7 3 78 50 32 88 31 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
10 8 125 96 0 0 0 0.232 54 1
11 4 110 92 0 0 37.6 0.191 30 0
12 10 168 74 0 0 38 0.537 34 1
13 10 139 80 0 0 27.1 1.441 57 0
14 1 189 60 23 846 30.1 0.398 59 1
15 5 166 72 19 175 25.8 0.587 51 1
16 7 100 0 0 0 30 0.484 32 1
17 0 118 84 47 230 45.8 0.551 31 1
18 7 107 74 0 0 29.6 0.254 31 1
19 1 103 30 38 83 43.3 0.183 33 0
20 1 115 70 30 96 34.6 0.529 32 1
………
761 2 88 58 26 16 28.4 0.766 22 0
762 9 170 74 31 0 44 0.403 43 1
763 9 89 62 0 0 22.5 0.142 33 0
764 10 101 76 48 180 32.9 0.171 63 0
765 2 122 70 27 0 36.8 0.34 27 0
766 5 121 72 23 112 26.2 0.245 30 0
767 1 126 60 0 0 30.1 0.349 47 1
768 1 93 70 31 0 30.4 0.315 23 0

OUTPUT:
8 2
6 1
0 1
3 1
2 1
10 1
1 1
5 1
Name: pregnant, dtype: int64
4.777777777777778
3.492054473292827

26
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

0 8
Name: pregnant, dtype: int64
12.194444444444445
0.07976827393633572
-1.4086477342894645
148 1
183 1
137 1
78 1
197 1
125 1
168 1
189 1
166 1
Name: glucose, dtype: int64
154.55555555555554
37.4069215223303
0 78
1 125
2 137
3 148
4 166
5 168
6 183
7 189
8 197
Name: glucose, dtype: int64
1399.2777777777776
-1.0313853707975722
0.978100480723807
72 2
64 1
40 1
50 1
70 1
96 1
74 1
60 1
Name: bp, dtype: int64
66.44444444444444
15.898986690282428
0 72
Name: bp, dtype: int64
252.77777777777777
0.13656074039998925
1.0136266496454889
0 3
35 2
32 1
45 1
23 1

27
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

19 1
Name: skin, dtype: int64
21.0
17.392527130926087
0 0
Name: skin, dtype: int64
302.5
-0.21810449977720303
-1.6245455521188052
0 4
168 1
88 1
543 1
846 1
175 1
Name: insulin, dtype: int64
202.22222222222223
297.7233521987223
0 0
Name: insulin, dtype: int64
88639.19444444445
1.6550059400947466
1.9781799097232406
33.6 1
23.3 1
43.1 1
31.0 1
30.5 1
0.0 1
38.0 1
30.1 1
25.8 1
Name: bmi, dtype: int64
28.37777777777778
12.189521912053994
0 0.0
1 23.3
2 25.8
3 30.1
4 30.5
5 31.0
6 33.6
7 38.0
8 43.1
Name: bmi, dtype: float64
148.58444444444447
-1.6632175863057026
3.989200103644359
0.627 1
0.672 1
2.288 1

28
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

0.248 1
0.158 1
0.232 1
0.537 1
0.398 1
0.587 1
Name: pedigree, dtype: int64
0.6385555555555555
0.6462886566989844
0 0.158
1 0.232
2 0.248
3 0.398
4 0.537
5 0.587
6 0.627
7 0.672
8 2.288
Name: pedigree, dtype: float64
0.41768902777777767
2.521186410463069
6.957869024161264
50 1
32 1
33 1
26 1
53 1
54 1
34 1
59 1
51 1
Name: age, dtype: int64
43.55555555555556
12.135805608931685
0 26
1 32
2 33
3 34
4 50
5 51
6 53
7 54
8 59
Name: age, dtype: int64
147.27777777777774
-0.23884551536673332
-1.9149668580541754

RESULT:-
Thus the above program univariant analysis on pima Indian diabetes for diabetics patients is verified

29
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.1.b UNIVARIATE ANALYSIS

Date: PIMA INDIAN DIABETES FOR NON DIABETIC PATIENTS
AIM
To write a python program for univariate analysis using Pima Indians Diabetes data set for non diabetic
patients.
ALGORITHM :-
Step 1: Start the program
Step 2 :import pandas packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate non diabetic patients from the data set
Step 5:calculate frequency,mean,median,mode,standard deviation,variance,skewness,kurtosis by using in built
functions value_counts(),mean(),median(),mode(),std(),var(),skew(),kurt() for independent fileds
pregnant,glucose,bp,skin,insulin,bmi,pedigree in the data set
Step 6:stop the program
PROGRAM:
import pandas as pd
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==0]
print(df1['pregnant'].value_counts())
print(df1['pregnant'].mean())
print(df1['pregnant'].median())
print(df1['pregnant'].std())
print(df1['pregnant'].mode())
print(df1['pregnant'].var())
print(df1['pregnant'].skew())
print(df1['pregnant'].kurt())
print(df1['glucose'].value_counts())
print(df1['glucose'].mean())
print(df1['glucose'].median())
print(df1['glucose'].std())
print(df1['glucose'].mode())
print(df1['glucose'].var())
print(df1['glucose'].skew())
print(df1['glucose'].kurt())
print(df1['bp'].value_counts())
print(df1['bp'].mean())
print(df1['bp'].median())
print(df1['bp'].std())
print(df1['bp'].mode())
print(df1['bp'].var())
print(df1['bp'].skew())
print(df1['bp'].kurt())
print(df1['skin'].value_counts())
print(df1['skin'].mean())
print(df1['skin'].median())
print(df1['skin'].std())
print(df1['skin'].mode())
print(df1['skin'].var())
print(df1['skin'].skew())
print(df1['skin'].kurt())
print(df1['insulin'].value_counts())

30
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

print(df1['insulin'].mean())
print(df1['insulin'].median())
print(df1['insulin'].std())
print(df1['insulin'].mode())
print(df1['insulin'].var())
print(df1['insulin'].skew())
print(df1['insulin'].kurt())
print(df1['bmi'].value_counts())
print(df1['bmi'].mean())
print(df1['bmi'].median())
print(df1['bmi'].std())
print(df1['bmi'].mode())
print(df1['bmi'].var())
print(df1['bmi'].skew())
print(df1['bmi'].kurt())
print(df1['pedigree'].value_counts())
print(df1['pedigree'].mean())
print(df1['pedigree'].median())
print(df1['pedigree'].std())
print(df1['pedigree'].mode())
print(df1['pedigree'].var())
print(df1['pedigree'].skew())
print(df1['pedigree'].kurt())
print(df1['age'].value_counts())
print(df1['age'].mean())
print(df1['age'].median())
print(df1['age'].std())
print(df1['age'].mode())
print(df1['age'].var())
print(df1['age'].skew())
print(df1['age'].kurt())

31
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Pima Indians Diabetics.csv

OUTPUT:
1 2
10 2
5 1
4 1
Name: pregnant, dtype: int64
5.166666666666667
4.5
4.070217029430577
0 1
1 10
Name: pregnant, dtype: int64

32
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

16.56666666666667
0.3539476771784376
-1.9239379941621584
85 1
89 1
116 1
115 1
110 1
139 1
Name: glucose, dtype: int64
109.0
112.5
19.809088823063014
0 85
1 89
2 110
3 115
4 116
5 139
Name: glucose, dtype: int64
392.4
0.221379243643542
-0.31517019081197173
66 2
74 1
0 1
92 1
80 1
Name: bp, dtype: int64
63.0
70.0
32.36664950222682
0 66
Name: bp, dtype: int64
1047.6
-1.9408208876079585
4.311601666734459
0 4
29 1
23 1
Name: skin, dtype: int64
8.666666666666666
0.0
13.559744343705992
0 0
Name: skin, dtype: int64
183.86666666666665
1.052575991628368
-1.3694264585166165
0 5
94 1

33
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Name: insulin, dtype: int64

15.666666666666666
0.0
38.37533930360312
0 0
Name: insulin, dtype: int64
1472.6666666666665
2.4494897427831783
6.0
26.6 1
28.1 1
25.6 1
35.3 1
37.6 1
27.1 1
Name: bmi, dtype: int64
30.05
27.6
5.074938423271753
0 25.6
1 26.6
2 27.1
3 28.1
4 35.3
5 37.6
Name: bmi, dtype: float64
25.75499999999999
0.9474768598128397
-1.3608302417826117
0.351 1
0.167 1
0.201 1
0.134 1
0.191 1
1.441 1
Name: pedigree, dtype: int64
0.41416666666666674
0.196
0.5085675635219638
0 0.134
1 0.167
2 0.191
3 0.201
4 0.351
5 1.441
Name: pedigree, dtype: float64
0.25864096666666664
2.336696757512407
5.534561849601891
30 2
31 1

34
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

21 1
29 1
57 1
Name: age, dtype: int64
33.0
30.0
12.312595177297109
0 30
Name: age, dtype: int64
151.6
1.923829603041346
4.4999860763988

RESULT:-
Thus the above program univariant analysis on Pima Indian Diabetes for non Diabetic Patients is verified.

35
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.2.a BIVARIATE ANALYSIS - LINEAR REGRESSION

Date: PIMA INDIAN DIABETES FOR DIABETIC PATIENTS

AIM
To write a python program for linear regression using pima Indians Diabetes data set for diabetic
patients.
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and matplotlib.pyplot,scipy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate diabetic patients from the data set
Step 5:get x,y values from data set for linregress()
Step 6:Plot x,y values using scatterplot
Step 7: plot regression line by calculate y=bx+a
Step 8:stop the program
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==1]
x=df1["age"]
y=df1["bp"]
slope,intercept,r,p,std_err=stats.linregress(x,y)
defmyfunc(x):
return slope*x+intercept
mymodel=list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()

36
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Bivariant Analysis on Linear Regression usingPima Indians Diabetes for
Diabetic Patients is verified.

37
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.2.b BIVARIATE ANALYSIS - LINEAR REGRESSION

Date: PIMA INDIAN DIABETES FOR NON -DIABETIC PATIENTS

AIM
To write a python program for linear regression using pima Indians Diabetes data set for non diabetic
patients.
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and matplotlib.pyplot,scipy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4:separate non diabetic patients from the data set
Step 5:get x,y values from data set for linregress()
Step 6:Plot x,y values using scatterplot
Step 7: plot regression line by calculate y=bx+a
Step 8:stop the program
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
df=pd.read_csv("pima1.csv")
df1=df[df.outcome==0]
x=df1["age"]
y=df1["bp"]
slope,intercept,r,p,std_err=stats.linregress(x,y)
defmyfunc(x):
return slope*x+intercept
mymodel=list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()

38
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Bivariant Analysis on Linear Regression using Pima Indians Diabetes for Non
Diabetic Patients is verified.

39
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.3 BIVARIATE ANALYSES - LOGISTIC REGRESSION

Date: PIMA INDIAN DIABETES
AIM
To write a python program for logistic regression using pima Indians Diabetes data set
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and numpy packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4: get x,y values from data set for model.LogisticRegression()
Step 5: predict the y for the given x values based on pima Indian diabetes data set.
Step 6:stop the program
PROGRAM:
Import numpy as np
import pandas as pd
fromsklearn import linear_model
df=pd.read_csv("pima1.csv")
X=df[["bp"]]
y=df[["outcome"]]
logr=linear_model.LogisticRegression()
logr.fit(X.values,y.values.ravel())
predicted=logr.predict(np.array([66]).reshape(1,-1))
print(predicted)
OUTPUT:
[1]

RESULT:
Thus the above program Bivariate Analysis on Logistic Regression using Pima Indians Diabetes set is
verified.

40
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.4 MULTIPLE REGRESSION ANALYSIS

Date: PIMA INDIAN DIABETES

AIM
To write a python program for multiple regression analysis using pima Indians Diabetes data set
ALGORITHM
Step 1: Start the program
Step 2 :import pandas and liner_model from sklearn packages
Step 3:Read data from pima Indian Diabetes.csv data set
Step 4: get x as independent , y as dependent values from data set for linear_model_LinearRegression()
Step 5:predict the y for the given x values based on pima Indian diabetes data set.
Step 6:stop the program
PROGRAM:
import pandas as pd
from sklearn import linear_model
df=pd.read_csv("pima1.csv")
X=df[['age','bp']]
y=df['outcome']
regr=linear_model.LinearRegression()
regr.fit(X.values,y)
predicteddia=regr.predict([[62.11,63]])
print(predicteddia)
OUTPUT:
[1.00002405]

RESULT:
Thus the above program multiple regression Analysis using Pima Indians Diabetes set is verified.

41
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:5.5 RESULT ANALYSIS

Date:

AIM:
To perform result analysis for 5.1.a,5.1.b,5.2.a and 5.2.b using pima Indians diabetes data set.
ANALYSIS
5.1.a vs 5.1.b
category function Diabetes Non-Diabetes
8 2
6 1
0 1
3 1 1 2
2 1 10 2
10 1 5 1
1 1 4 1
frequency 5 1
pregnant
mean 4.777777777777778 5.166666666666667
standard deviation 3.492054473292827 4.070217029430577
0 1
mode 0 8 1 10
variance 12.194444444444445 16.56666666666667
skewness 0.07976827393633572 0.3539476771784376
kurtosis -1.4086477342894645 -1.9239379941621584
148 1
183 1
137 1
78 1 85 1
197 1 89 1
125 1 116 1
168 1 115 1
189 1 110 1
166 1 139 1
frequency
mean 154.55555555555554 109.0
standard deviation 37.4069215223303 19.809088823063014
glucose 0 78
1 125
2 137 0 85
3 148 1 89
4 166 2 110
5 168 3 115
6 183 4 116
7 189 5 139
mode 8 197
variance 1399.2777777777776 392.4
skewness -1.0313853707975722 0.221379243643542
kurtosis 0.978100480723807 -0.31517019081197173

42
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

72 2
64 1
40 1 66 2
50 1 74 1
70 1 0 1
96 1 92 1
74 1 80 1
bp frequency 60 1
mean 66.44444444444444 63.0
standard deviation 15.898986690282428 32.36664950222682
mode 0 72 0 66
variance 252.77777777777777 1047.6
skewness 0.13656074039998925 -1.9408208876079585
kurtosis 1.0136266496454889 4.311601666734459
0 3
35 2
32 1 0 4
45 1 29 1
23 1 23 1
frequency 19 1
skin mean 21.0 8.666666666666666
standard deviation 17.392527130926087 13.559744343705992
mode 0 0 0 0
variance 302.5 183.86666666666665
skewness -0.21810449977720303 1.052575991628368
kurtosis -1.6245455521188052 -1.3694264585166165
0 4
168 1
88 1
543 1 0 5
846 1 94 1
frequency 175 1
insulin mean 202.22222222222223 15.666666666666666
standard deviation 297.7233521987223 38.37533930360312
mode 0 0 0 0
variance 88639.19444444445 1472.6666666666665
skewness 1.6550059400947466 2.4494897427831783
kurtosis 1.9781799097232406 6.0
33.6 1
23.3 1 26.6 1
43.1 1 28.1 1
31.0 1 25.6 1
bmi 35.3 1
30.5 1
0.0 1 37.6 1
38.0 1 27.1 1
frequency 30.1 1

43
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

25.8 1
mean 28.37777777777778 30.05
standard deviation 12.189521912053994 5.074938423271753
0 0.0
1 23.3
2 25.8 0 25.6
3 30.1 1 26.6
4 30.5 2 27.1
5 31.0 3 28.1
6 33.6 4 35.3
7 38.0 5 37.6
mode 8 43.1
variance 148.58444444444447 25.75499999999999
skewness -1.6632175863057026 0.9474768598128397
kurtosis 3.989200103644359 -1.3608302417826117
0.627 1
0.672 1
2.288 1 0.351 1
0.248 1 0.167 1
0.158 1 0.201 1
0.232 1 0.134 1
0.537 1 0.191 1
0.398 1 1.441 1
frequency 0.587 1
mean 0.6385555555555555 0.41416666666666674
standard deviation 0.6462886566989844 0.5085675635219638
pedigree 0 0.158
1 0.232
2 0.248 0 0.134
3 0.398 1 0.167
4 0.537 2 0.191
5 0.587 3 0.201
6 0.627 4 0.351
7 0.672 5 1.441
mode 8 2.288
variance 0.41768902777777767 0.25864096666666664
skewness 2.521186410463069 2.336696757512407
kurtosis 6.957869024161264 5.534561849601891
50 1
32 1
33 1
26 1 30 2
age 53 1 31 1
54 1 21 1
34 1 29 1
59 1 57 1
frequency 51 1

44
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

mean 43.55555555555556 33.0

standard deviation 12.135805608931685 12.312595177297109
0 26
1 32
2 33
3 34
4 50
5 51
6 53
7 54 0 30
mode 8 59
variance 147.27777777777774 151.6
skewness -0.23884551536673332 1.923829603041346
kurtosis -1.9149668580541754 4.4999860763988

45
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

5.2.a vs 5.2.b

5.2.a

5.2.b

RESULT:-
Thus the above program 5.1.a, 5.1.b,5.2.a and 5.2.b results are analyzed.

46
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.no:6.1 NORMAL CURVES –IRIS DATA SET

Date:

AIM
To write a python program for Normal curves using Iris data set

ALGORITHM:-
Step 1: Start the program
Step 2 :import pandas, matplotlin.pyplot packages
Step 3:Read data from Iris.csv data set
Step 4: get x,y from the data set
Step 5:Plot dotted line curve by using x,y values
Step 6:stop the program

PROGRAM:-
import pandas as pd
importmatplotlib.pyplot as plt
df=pd.read_csv("irisdata.csv")
x=df['sepal.length']
y=df['petal.length']
plt.plot(x,y, linestyle = 'dotted')
plt.show()

47
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

RESULT:-

Thus the above program Normal Curves using Iris Data Set is verified.

48
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.no:6.2 DENSITY AND CONTOUR PLOTS -

Date: PIMA INDIANS DIABETES DATA SET
AIM:
To write a python program for density and contour plots using Pima Indians diabetes data set
ALGORITHM :
Step 1: Start the program
Step 2 :import Numpy, pandas, matplotlin.pyplot packages
Step 3:Read data from pima Indians diabetes.csv data set
Step 4:Get x,y values from data set
Step 5:Generate combinations of gri ds by using meshgrid()
Step 6: Define a function func () for calculating third value z
Step7: draw rectangular contour plot by using contour()
Step 8:stop the program
PROGRAM:-
importnumpy as np
import pandas as pd
importmatplotlib.pyplot as plt
df=pd.read_csv("pima Indian Diabetes.csv")
# define a function
deffunc(x,y):
returnnp.sin(x)** 2 + np.cos(y)**2
# getx,y
x=df['bp']
y=df['glucose']
# Generate combination of grids
X,Y = np.meshgrid(x,y)
Z = func(X,Y)
# Draw rectangular contour plot
plt.contour(X,Y,Z,cmap='gist_rainbow_r')
plt.show()

49
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:-

RESULT:-
Thus the above program Density and Control Plots using Pima Indians diabetes Set is verified.

50
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.3.a CORRELATION – IRIS DATA SET

Date:

AIM:
To write a python program for Correlation using Iris data Set.
ALGORITHM:
1. Start the program.
2. Import pandas and matplotlib pyplot package.
3. Read iris.Csv file.
4.Calculate containing correlation co-efficient of r by using (method=”person”) defined on pandas package.
5. Coefficient of r for each column by using correlation.
6.Stop the program.
PROGRAM:-
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
x=df.corr(method='pearson')
print(x)

OUTPUT:
sepal.lengthsepal.widthpetal.lengthpetal.width
sepal.length 1.000000 -0.117570 0.871754 0.817941
sepal.width -0.117570 1.000000 -0.428440 -0.366126
petal.length 0.871754 -0.428440 1.000000 0.962865
petal.width 0.817941 -0.366126 0.962865 1.000000

RESULT:
Thus the above program Correlation using Iris Data Set is verified.

51
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.3.b SCATTER PLOT- IRIS DATA SET

Date:
AIM:
To write the python program for Scatter Plot using Iris Data Set .
ALGORITHM:
1. Start the program.
2.import pandas and matplotlib.pyplot packages.
3.Read iris.csv file
4.Draw the scatter plot for 3 different varieties of flowers using their sepal length and petal length
5.Stop the program

PROGRAM:-
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
df1=df[df.variety=='Setosa']
x1=df1['sepal.length']
y1=df1['petal.length']
df2=df[df.variety=='Versicolor']
x2=df2['sepal.length']
y2=df2['petal.length']
df3=df[df.variety=='Virginica']
x3=df3['sepal.length']
y3=df3['petal.length']
plt.scatter(x1,y1,color='red',marker='o',label='Setosa')
plt.scatter(x2,y2,color='blue',marker='s',label='Versicolor')
plt.scatter(x3,y3,color='green',marker='x',label='Virginica')
plt.title('iris_scatterplot')
plt.xlabel("sepal.length[cm]")
plt.ylabel("petal.length[cm]")
plt.show()

52
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Scatter Plot using Iris Data Set is verified.

53
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.4 HISTOGRAM – IRIS DATA SET

Date:

AIM:
To write the python program for Histogram using Iris Data set.

ALGORITHM:
1. Start the program.
2. Import pandas and matplotlib-pyplot package.
3. Read Iris_csv file.
4. Draw the Histogram using hist() for sepal.length as X-axis.
5. Draw the histogram using hist() for sepal.width as X-axis.
6. Draw the histogram using hist() for petal.length as x-axis.
7. Draw the histogram usingn hist() for petal.width as x-axis.
8. Stop the program.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
x1=df["sepal.width"]
plt.subplot(2,2,1)
plt.hist(x1,bins=20,color="green")
plt.title("sepal width in cm")
plt.xlabel("sepal width cm")
plt.ylabel("count")
x2=df["sepal.length"]
plt.subplot(2,2,2)
plt.hist(x2,bins=20,color="blue")
plt.title("sepal length in cm")
plt.xlabel("sepal length cm")
plt.ylabel("count")
x3=df["petal.length"]
plt.subplot(2,2,3)
plt.hist(x3,bins=20,color="red")
plt.title("petal length in cm")
plt.xlabel("petal length cm")
plt.ylabel("count")
x4=df["petal.width"]
plt.subplot(2,2,4)
plt.hist(x4,bins=20,color="yellow")
plt.title("petal width in cm")
plt.xlabel("petal width cm")
plt.ylabel("count")
plt.show()

54
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program Histogram using Iris Data Set is verified.

55
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:6.5 THREE DIMENSIONAL PLOTTING –

Date: PIMA INDIANS DIABETES SET

AIM:
To write the python program for three dimensional plotting using Pima Indians Diabetes Data set.

ALGORITHM:
1. Start the program.
2. Import mplot3d from mpl_tiilkits ,numpy ,pandas and matplotlib.pyplot packages.
3. Read pima1.csv file.
4. Read x and y.
5. Grid is generated for X and Y axis using meshgrid().
6.Z axis is generated by using f(x,y).
7. Draw the wireframe by using plot_wireframe().
8. Stop the program.

PROGRAM:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df=pd.read_csv("pima1.csv")
#function for z area
def f(x,y):
return np.sin(np.sqrt(x**2+y**2))
#x and y axis
x=df['age']
y=df['bp']
X,Y=np.meshgrid(x,y)
Z=f(X,Y)
fig=plt.figure()
ax=plt.axes(projection='3d')
ax.plot_wireframe(X,Y,Z,color='green')
ax.set_title('wireframe geeks for geeks')
plt.show()

56
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

OUTPUT:

RESULT:
Thus the above program three dimensional plotting using Pima Indians Diabetes Set is verified.

57
Department of CSE & IT
Salem college of Engineering and Technology Data Science Laboratory

Ex.No:7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

Date:

AIM:
To write the python program for Visualizing Geographic Data with Basemap
ALGORITHM:
1. Start the program.
2. Import matplotlib.pyplot and Basemap from mpl_toolkits.basemap package.
3..Draw figure window using figure()
4. Draw the coastline using drawcoastlines()
5. Draw the countries using drawcountries().
6. Fill the continents using fillcontinents().
7. Draw the boundaries using drawmapboundary().
8. Stop the program.
PROGRAM:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(12,12))
m=Basemap()
m.drawcoastlines(linewidth=1.0,linestyle='solid',color='black')
m.drawcountries(linewidth=1.0,linestyle='solid',color='k')
m.fillcontinents(color='coral',lake_color='aqua')
m.drawmapboundary(color='b',linewidth=2.0,fill_color='aqua')
plt.title("Filled map boundary",fontsize=20)
plt.show()

OUTPUT:

RESULT:-
Thus the above program for Visualizing Geographic Data with Basemap is verified.

58
Department of CSE & IT

340AJ Service 3121259 Jan-2012 Global English PDF
No ratings yet
340AJ Service 3121259 Jan-2012 Global English PDF
356 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
85 pages
OCS353-Data Science Fundamentals Manual 1
No ratings yet
OCS353-Data Science Fundamentals Manual 1
34 pages
FDS Aim Algorithm
No ratings yet
FDS Aim Algorithm
18 pages
fods(1)-merged (1)-1
No ratings yet
fods(1)-merged (1)-1
100 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
23CS302 - dslab - experiment 1
No ratings yet
23CS302 - dslab - experiment 1
5 pages
DS409 DataScience LabManual Jan2021
No ratings yet
DS409 DataScience LabManual Jan2021
41 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Lesson Plan For CS3361
No ratings yet
Lesson Plan For CS3361
2 pages
Data Science
No ratings yet
Data Science
3 pages
DATA SCIENCE EDIT
No ratings yet
DATA SCIENCE EDIT
1 page
Ml record_merged (1)
No ratings yet
Ml record_merged (1)
29 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
OCS353 - Data Science Manual-FULL
No ratings yet
OCS353 - Data Science Manual-FULL
64 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
OCS353-Data Science Fundamentals Manual 1_pdf
No ratings yet
OCS353-Data Science Fundamentals Manual 1_pdf
6 pages
CS3361 DS LAB_edited
No ratings yet
CS3361 DS LAB_edited
2 pages
DS LAB MANUAL (1)
No ratings yet
DS LAB MANUAL (1)
113 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
lab manual fds
No ratings yet
lab manual fds
44 pages
22-ML Lab Expt 1.docx
No ratings yet
22-ML Lab Expt 1.docx
29 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
CS3361-DATA SCIENCE LAB MANUAL
No ratings yet
CS3361-DATA SCIENCE LAB MANUAL
44 pages
DS LM
No ratings yet
DS LM
110 pages
CS3362 - Data Science Laboratory - Manual - Final-1
No ratings yet
CS3362 - Data Science Laboratory - Manual - Final-1
76 pages
FDSA MANUAL
No ratings yet
FDSA MANUAL
53 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Fds Record
No ratings yet
Fds Record
69 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
CS 3361 Data Science Laboratory Syllabus
No ratings yet
CS 3361 Data Science Laboratory Syllabus
1 page
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Fds Lab Final 2nd Year (1)
No ratings yet
Fds Lab Final 2nd Year (1)
75 pages
Lab 05 ICT
No ratings yet
Lab 05 ICT
4 pages
fds1
No ratings yet
fds1
5 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Data Science Laboratory
No ratings yet
Data Science Laboratory
2 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
AD-502 Machine Learning Lab_Exp 1-10 (1)
No ratings yet
AD-502 Machine Learning Lab_Exp 1-10 (1)
13 pages
Data Ty
No ratings yet
Data Ty
59 pages
Cs3361 Set3 Fds Anna University
No ratings yet
Cs3361 Set3 Fds Anna University
3 pages
fods lab
No ratings yet
fods lab
54 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Python For Data Science
No ratings yet
Python For Data Science
22 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
VRM Gas Flow
100% (1)
VRM Gas Flow
16 pages
Swans D300 Active Speaker Manual: Designed by Hivi Acoustics, Inc
No ratings yet
Swans D300 Active Speaker Manual: Designed by Hivi Acoustics, Inc
6 pages
Objectives: I. DC Circuit Basics
No ratings yet
Objectives: I. DC Circuit Basics
10 pages
God Knows Guitar Tabs Aya Hirano
0% (1)
God Knows Guitar Tabs Aya Hirano
2 pages
ERS M11 MaK32 Trawler
100% (1)
ERS M11 MaK32 Trawler
2 pages
OAW-AP85 Installation Guide Rev01
No ratings yet
OAW-AP85 Installation Guide Rev01
48 pages
RSW7420 - Alasund Shipbrokers LTD
No ratings yet
RSW7420 - Alasund Shipbrokers LTD
5 pages
Foundation Skills in Integrated Product Development For R-2013 by S. Arunprasath, K. Sriram Kumar, P.Krishna Sankar
No ratings yet
Foundation Skills in Integrated Product Development For R-2013 by S. Arunprasath, K. Sriram Kumar, P.Krishna Sankar
6 pages
What Is Action Research?: Action Research Is Focused On Solving Specific Classroom or School Problems, Improving
No ratings yet
What Is Action Research?: Action Research Is Focused On Solving Specific Classroom or School Problems, Improving
7 pages
Flare Design and Calculations
0% (1)
Flare Design and Calculations
19 pages
PATHFit-4-volleyball - BEED
No ratings yet
PATHFit-4-volleyball - BEED
8 pages
Principles of Management: The Friday Cinema
No ratings yet
Principles of Management: The Friday Cinema
22 pages
GE Multilin L90 V7.31 Line PTT User Manual ENU PDF
100% (1)
GE Multilin L90 V7.31 Line PTT User Manual ENU PDF
8 pages
DLP Co1 6filpino
No ratings yet
DLP Co1 6filpino
5 pages
Module 2 - Entrepreneurship
No ratings yet
Module 2 - Entrepreneurship
10 pages
Cellosize QP 100MH
No ratings yet
Cellosize QP 100MH
4 pages
Conversion Reactor: CRV-100: Stream Name
No ratings yet
Conversion Reactor: CRV-100: Stream Name
7 pages
CHAPTER 1 PROJECT DEFENSE
No ratings yet
CHAPTER 1 PROJECT DEFENSE
44 pages
Baterias Secas
No ratings yet
Baterias Secas
2 pages
Result View (2)
No ratings yet
Result View (2)
1 page
Laghu Parashari Sidhant: Samajna Adhyaya
No ratings yet
Laghu Parashari Sidhant: Samajna Adhyaya
5 pages
A Technology Integration Planning
No ratings yet
A Technology Integration Planning
8 pages
Ec-Tds Analyser - CM 183 ELICO.: 1) Works Instructions
No ratings yet
Ec-Tds Analyser - CM 183 ELICO.: 1) Works Instructions
3 pages
Alcorcon PIPE Merged Solved
100% (1)
Alcorcon PIPE Merged Solved
80 pages
36.-ISC-Artificial-Intelligence
No ratings yet
36.-ISC-Artificial-Intelligence
13 pages
LECTURE 4 - Design of Singly Reinforced Beams (Design)
No ratings yet
LECTURE 4 - Design of Singly Reinforced Beams (Design)
29 pages
Types of Prepositions With Examples
No ratings yet
Types of Prepositions With Examples
15 pages
Simple Column Design Example
No ratings yet
Simple Column Design Example
5 pages
Analitik Data Akuntansi Industry Transportation
No ratings yet
Analitik Data Akuntansi Industry Transportation
7 pages