0% found this document useful (0 votes)
13 views

Grace Python Numpy MB

Uploaded by

bharathi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Grace Python Numpy MB

Uploaded by

bharathi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

4931_Grace College of Engineering,

GRACE COLLEGE OF ENGINEERING


MULLAKKADU, THOOTHUKUDI -628005

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING

CS 3361- DATA SCIENCE


LABORATORY II YEAR/III
SEMESTER
REGULATION
2021

LAB
PREPARED BY Mrs.P.JOY

SUGANTHY BAI,

Assistant professor
CSE Department
Grace College of Engineering, Thoothukudi

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

List of Experiments
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandaspackages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands
for doingdescriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing thefollowing:
a. Univaria te analysis: Frequency, Mean, Median, Mode, Variance,
Deviation, Standard
b. BivariatSkewness and Kurtosis.
c. Multiplee analysis: Linear and logistic regression modeling
d. Also co Regression analysis
mpare the results of the above analysis for the two data sets.
6. Apply a
a. Normal nd explore various plotting functions on UCI data
b. Density sets. curves
c. Correlati and contour plots
d. Histogra on and scatter
e. Three di plots ms
mensional plotting

7. Visualiz
ing Geographic Data with Basemap

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:1 Download, install and explore the features of


NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages

Anaconda:

Anaconda is a distribution of the Python and R programming languages for


scientific computing (data science, machine learning applications, large-scale data processing,
predictive analytics, etc.), that aims to simplify package management and deployment.

Jupyter NoteBook:
The Jupy ter Notebook is an open-source web application that allows you t o create and
share documents that contain live code, equations, visualizations, and ext. Its
uses include data narrative
cle t aning and transformation, numerical simulation, deling,
data visualization,statistical
ma mo chine learning, and much more.
NumPy:
NumPy is a Python library used for working with arrays. It also has for working
in domain of line functions ar algebra, fourier transform, and matrices.
SciPy:
SciPy is ciPy stands
for Scientific Py a scientific computation library that uses NumPy underneath. and signal
processing. Like S thon. It provides more utility functions for optimization,
stats NumPy, SciPy is open source so we can use it freely
NumPy, stands f erical array
data. SciPy, stan or Numerical Python, is used for the manipulation of elements of . Both these
packages providenum ds for Scientific Python, is used for numerical computations in
Python
Statsmodels : extended functionality to work with
Statsmo Python. lyze various
statistical models . It includes
various models o es, weighted
least squares, etcdels is a popular library in Python that enables us to estimate and
ana

Pandas
Pandas are really powerful. They provide you with a huge set of important commands and
features
which are used to easily analyze your data. We can use Pandas to perform various tasks like
filtering your data according to certain conditions, or segmenting and segregating the data
according to preference, etc.

Download Anaconda:

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

1. Type “Anaconda Download” in Google Chrome.

2. Scroll the anacon da products website below.Click the 64bit installer.

3. Click the downloaded .exe file and install Anaconda

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

4. Click the Just M e option

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

5. Anaconda Stored in C: \ Users path

6.After installati on open the anaconda navigator

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

6. Launch Jupitor Notebook for typing programs

7. Click “ New”

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

8. Click the Run button to get the output.

9. Sample program s

Result :
Thus the NumPy, SciPy, Jupyter, Statsmodels and Pandas packages are and
installed successf downloaded ully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:2 Working with Numpy arrays

Aim:
To work with Numpy array using Jupyter Notebook.

Program 1:
Python program to demonstrate
# basic array characteristics
import numpy as np

# Creating array object


arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )

# Printing type of arr object


print("Array is of type: ", type(arr))

# Printing array dimensions (axes)


print("No. of d imensions: ", arr.ndim)

# Printing shape of array


print("Shape of array: ", arr.shape)

# Printing size (t otal number of elements) of


print("Size of arr array ay: ", arr.size)

# Printing type of elements in array


print("Array stor es elements of type: ", arr.dtype)

Output:
array is of type:
No. of dimensio ns:
Shape of array: 2
Size of array: 6
Array stores elements of type: int64

Program 2:
# Python program to demonstrate
# array creation techniques
import numpy as np

# Creating array from list with type float


9

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')


print ("Array created using passed list:\n", a)

# Creating array from tuple


b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)

# Creating a 3X4 array with all zeros


c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)

# Create a constant value array of complex type


d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn arra y initialized with all
"Array ty 6s." pe is complex:\n",
d)
# Create an arra
e = np.random.ray with random
print ("\nA ra values ndom((2,
2))
# Create a seque
ndom array:\n", e)
# from 0 to 30 w
f = np.arange(0, nce of
print ("\nA seq integers ith
steps of 5
# Create a seque 30, 5)
g =uential array with steps of 5:\n", f)
np.linspace( prin
t ("\nA seq nce of 10 values in range 0 to 5
0, 5, 10)
uential array with 10 values
# Reshaping 3 between" "0 and 5:\n", g)
arr = np.array([[
[5, 2,X4 array to 2X2X3 array
[1, 2, 1, 2, 3, 4],

newarr = arr.reshape(2, 2, 3)

print ("\nOriginal array:\n", arr)


print ("Reshaped array:\n", newarr)

# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()

print ("\nOriginal array:\n", arr)


print ("Fattened array:\n", flarr)

10

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:
Array created using passed list:
[[ 1. 2. 4.]
[ 5. 8. 7.]]

Array created using passed tuple:


[1 3 2]

An array initialized with all zeros:


[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

An array initializ ed with all 6s. Array type is complex:


[[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]]

A random array:
[[ 0.46829566 0.67079389]
[ 0.09079849 0. 95410464]]

A sequential arra y with steps of 5:


[ 0 5 10 15 20 2 5]

A sequential array with 10 values between 0 and 5:


[ 0. 0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
3.33333333 3.88888889 4.44444444 5. ]

Original array:
[[1 2 3 4]
[5 2 4 2]
11

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]

[[4 2 1]
[2 0 1]]]

Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]

Program3:
# Python progra m to
# indexing in nu demonstrate
import numpy mpy
as np
# An exemplar a
arr = np.array([[- rray
[4, -0. 1, 2, 0, 4],
[2.6, 5, 6, 0],
[3, -7, 0, 7, 8],
4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
print ("Array with first 2 rows and alternate"
"columns(0 and 2):\n", temp)

# Integer array indexing example


temp = arr[[0, 1, 2, 3], [3, 2, 1, 0]]
print ("\nElements at indices (0, 3), (1, 2), (2, 1),"
"(3, 0):\n", temp)

# boolean array indexing example


cond = arr > 0 # cond is a boolean array
temp = arr[cond]
12

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

print ("\nElements greater than 0:\n", temp)


Output:

Array with first 2 rows and alternatecolumns(0 and 2):


[[-1. 0.]
[ 4. 6.]]

Elements at indices (0, 3), (1, 2), (2, 1),(3, 0):


[ 4. 6. 0. 3.]

Elements greater than 0:


[ 2. 4. 4. 6. 2.6 7. 8. 3. 4. 2.
Program 4: ]

# Python progr
# basic operatio am to
import numpy demonstrate ns
on single array
a = np.array([1, as np

# add 1 to ever 2, 5, 3])


print ("Adding
y
# subtract 3element
print ("S 1 to every element:",
a+1)
# multiply
print ("Mul from each element
ubtracting 3 from each element:", a-3)
# square
print ("S each element by

# modify existing array


a *= 2
print ("Doubled each element of original array:", a)

# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])

print ("\nOriginal array:\n", a)


print ("Transpose of array:\n", a.T)
Output :
13

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Adding 1 to every element: [2 3 6 4]


Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]

Original array:
[[1 2 3]
[3 4 5] [9 6 0]]
Transpose of arr
[[1 3 9] ay:
[2 4 6]
[3 5 0]]

# Python pro
import nump
gram to demonstrate sorting in
a = np.array([[numpy y as np
[3,
[0, - 1, 4, 2],
4, 6],
# sorted array1, 5]])
print ("Array
n
elements in sorted order:\
# sort array ro n", p.sort(a, axis = None))
print ("Row-wise sorted array:\n",
w-wise
np.sort(a, axis = 1))

# specify sort algorithm


print ("Column wise sort by applying merge-sort:\n",
np.sort(a, axis = 0, kind = 'mergesort'))

# Example to show sorting of structured array


# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]

14

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# Values to be put in array


values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7),
('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]

# Creating array
arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n",
np.sort(arr, order = 'name'))

print ("Array sorted by graduation year and then cgpa:\n",


np.sort(arr, order = ['grad_year', 'cgpa']))
Output:
Array elements i n sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
Column wise sor t by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:


[('Aakash', 2009, 9.0) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Pankaj', 2008, 7.9)]
Array sorted by graduation year and then cgpa:
[('Pankaj', 2008, 7.9) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Aakash', 2009, 9.0)]

Result:
Thus the programs using numpy executed successfully.

15

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:3 Working with Pandas Data Frames

Aim:
To Work with Pandas data frames using Jupyter Notebook

Pandas Data Frames:

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.

Why Use Pand as?

Pandas allows us to analyze big data and make conclusions based on statistical ies.

Pandas can clean theor messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Example1:

import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.
DataFrame(mydataset)
print(myvar)
Create Labels
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

What is a DataFrame?

16

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table


with rows and columns.

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contai ns plain text and is a well know format that can be read by including
Pandas. everyone
import pandas as
pd
df = pd.read_csv
(' C:\Users\New\Desktop\AD8302\data.csv')
print(df.to_strin
g())

Result:
Thus the program using Data Frames were executed successfully.

17

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:4 Reading data from text files , Excel and the web and
exploring various commands for doing descriptive analysis
on the Iris data set.

Aim:
To Read data from text files , Excel and the web and exploring various commands for
doing descriptive analysis on the Iris data set.

Exploratory Data Analysis


Exploratory Data Analysis (EDA) is a technique to analyze data using some
visual
Techniques. With this technique, we can get detailed information about the statistical summary
of the data.

Iris Dataset
Iris Dataset is co nsidered as the Hello World for data science. It contains five colu mns namely
– Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering
plant, the researc hers have measured various features of the different iris flowers a nd recorded
them digitally.

Read a web dat ase


t:
import pandas
as pd
# Reading the C
df = pd.read_csv SV
file
# Printing top 5 r ("Iris.csv
df.head() ")

Output: ows

I.Getting Information about the Dataset


1. Shape: the shape parameter to get the shape of the dataset
df.shape

Output:
18

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

(150, 6)
The dataframe contains 6 columns and 150 rows

2. Info(): the columns and their data types


df.info()
Output:

3. Describe():The describe() function applies basic statistical computations on the dat aset like
extreme values, count of data points standard deviation, etc. Any missing value or aN value
is automatically N skipped. describe() function gives a good picture of the n of data
df.describe() distributio

Output:

II.Checking Missing Values


1. Isnull():We will check if our data contains any missing values or not. Missing values can
occur when no information is provided for one or more items or for a whole unit.
df.isnull().sum()
Output:

19

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

2. Checking Duplicates:
Pandas drop_duplicates() method helps in removing duplicates from the data frame
data = df.drop_duplicates(subset ="class")

Output:

3. Count:
Series. value_counts() function. This function returns a Series counts of
containing unique values.
df.value_counts("class")

Output:

III.Data Visuali zation: We will use Matplotlib and Seaborn library for the data vi sualization.
Matplotlib is ea sy to use and an amazing visualizing library in Python. It is built o n NumPy
arrays and desig ned to work with the broader SciPy stack and consists of several pl ots like
line, bar, scatter, histogram, etc
Seaborn is a library mostly used for statistical plotting in Python. It is built on top of
Matplotlib and provides beautiful default styles and color palettes to make statistical plots
more attractive

Data visualization using Matplot and Seeborn Library:


# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='class', data=df, )
plt.show()

20

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:

Relation betwee n variables

Hue: Hue para meter denotes which column decides the kind of color. end() is an
Legend(): A leg area describing the elements of the graph. bbox_to_anchor=[x0,
Bounding Box: y0] will create a bounding box with lower left corner at
position [x0, y0] . The legend will then be placed 'inside' this box and overlapp it ording to
the specified loc acc
Loc:The attribut parameter. lt value of
loc is loc=”best” e Loc in legend() is used to specify the location of the legend.
Defau
(upper left)
Example 1: Co

# importing packmparing Sepal Length and Sepal


import seaborn
import matplotli Width ages
as sns
b.pyplot as plt
sns.scatterplot(x
hue='class', data=df, )

# Placing Legend outside the Figure


plt.legend(bbox_to_anchor=(1, 1), loc=2)

plt.show()

Output:

21

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

From the above plot, we can infer that –


 Species Setosa has smaller sepal lengths but larger sepal widths.
 Versicolor Speci es lies in the middle of the other two species in terms of sepal length and
width
 Species Virginic a has larger sepal lengths but smaller sepal widths.

Example 2: Co mparing Petal Length and Petal

# importing pack Width ages


import seaborn as sns
import matplotli b.pyplot as plt

sns.scatterplot(x ='petallength',
hue='c y='petalwidth', lass',
data=df, )
# Placing Lege
plt.legend(bbo nd outside the Figure
x_to_anchor=(1, 1),
plt.show() loc=2)

Output:

22

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

From the above plot, we can infer that –


 Species Setosa has smaller petal lengths and widths.
 Versicolor Speci es lies in the middle of the other two species in terms of petal length and
width
 Species Virginic a has the largest of petal lengths and widths.
Let’s plot all the column’s relationships using a pairplot. It can be used for multivar iate
analysis.
Example:
# importing pac kag
import seaborn es
import matplotli as
sns
b.pyplot as plt
sns.pairplot(df.
hue='cla
drop(['Id'], axis =
1), ss', height=2)
Histograms

Histograms allo for uni as


well as bi-variate analysis.
Example:
import seaborn as sns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(10,10))

axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['sepallength'], bins=7)

23

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['sepalwidth'], bins=5);

axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['petallength'], bins=6);

axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['petalwidth'], bins=6);

Output:

Handling Correlation

24

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.

data = df.drop_duplicates(subset ="class",)


data.corr(method='pearson')

Output:

Box Plots

We can use boxp lots to see how the categorical value os distributed with other erical
values.
Example:
# importing pack num ages
import seaborn as sns
import matplotli b.pyplot as plt

def graph(y):
sns.boxplot(x ="class", y=y, data=df)

plt.figure(figsize =(10,10))

# Adding the su bplot at the specified


# grid position
plt.subplot(221)
graph('sepallengt h')

plt.subplot(222)
graph('sepalwidth')

plt.subplot(223)
graph('petallength')

plt.subplot(224)
graph('petalwidth')

plt.show()

Output:
25

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Handling
An Outlier is a Outliers e (so-called
normal)objects. data-item/object that deviates significantly from the rest of for outlier
is
detection is refer th They can be caused by measurement or execution errors. The ers, and the
removal process analys red to as outlier mining. There are many ways to detect the ataframe.
Let’s consider the iris dataset and let’s plot the boxplot for the SepalWidthCm column.
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset


df = pd.read_csv('iris_csv.csv')

sns.boxplot(x='sepalwidth', data=df)

26

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:

Removing Outli ers

For removing t he outlier, one must follow the same process of removing an entry fr om the
dataset using its exact position in the dataset because in all the above methods of detecting the
outliers end resu lt is the list of all those data items that satisfy the outlier definition according
to the method us ed.
Example: We w ill detect the outliers using IQR and then we will remove them. W e will also
draw the boxplot to see if the outliers are removed or not.
import sklearn
from sklearn.dat asets import
import pandas as load_boston pd
import seaborn a s
import numpy as sn
s
# Load the datas np
df = pd.read_csv
et
# IQR ('iris_csv.csv')
Q1 = np.percentile(df['sepalwidth'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['sepalwidth'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1

print("Old Shape: ", df.shape)

# Upper bound
upper = np.where(df['sepalwidth'] >= (Q3+1.5*IQR))

27

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# Lower bound
lower = np.where(df['sepalwidth'] <= (Q1-1.5*IQR))

# Removing the Outliers


df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)

print("New Shape: ", df.shape)

sns.boxplot(x='sepalwidth', data=df)

Output:

Ex.No:4b: Various Commands in data frame:


1. pandas.DataFrame

pandas.DataFrame() used to create a DataFrame in pandas. There are two ways to use this
function. You can form a DataFrame column-wise by passing a dictionary into
the pandas.DataFrame() function. Here, each key is a column, while the values are the rows:

import pandas
DataFrame = pandas.DataFrame({"A" : [1, 3, 4], "B": [5, 9, 12]})
print(DataFrame)

28

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

A B
0 1 5
1 3 9
2 4 12

2. Read From and Write to Excel or CSV in pandas

You can read or write to Excel or CSV files with pandas.

import pandas as pd
df = pd.read_csv("iris_csv.csv")
print(df)

3. Get the Mean , Median, and Mode

We can a lso compute the central tendencies of each column in a DataFrame ing
pandas.: us

DataFrame.mean
()
df.median()

df.mode()

4. DataFrame.tr
ansform
pandas' DataFr nction as an
argument. ame.transform() modifies the values of a DataFrame. It accepts a
fu
data = df.transfor
print(data)
5. DataFrame.is m(lambda y: y*3)

This func null s as True:

df.isnull().sum()

sepallength 0
sepalwidth 0
petallength 0
petalwidth 0
class 0
dtype: int64

6. Dataframe.info
29

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

It returns the summary of non-missing values for each column instead:

df.info()

df.describe()

7. DataFrame.loc

loc to used find the elements in a particular index. To view all items in the third row, for
instance:

data=df.loc[2]

print(data)

8. DataFrame.m ax, min

Getting t he maximum and minimum values using pandas is easy:

df.min()

df.max()

9. DataFrame.ast ype

The astype() function changes the data type of a particular column or .

DataFra DataFrame me.astype(str)

10. DataFrame.inse rt

' insert() fun ction used to add a new column to a DataFrame. It accepts three words,
the column nam key

DataFrame.insert(column = 'C', value = [3, 4, 6], loc=0)

print(DataFrame)

11. DataFrame.sum

The sum() function in pandas returns the sum of the values in each column

DataFrame.cumsum()

30

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

12. Correlation:

Want to find the correlation between integer or float columns? pandas can help you
achieve that using the corr() function.

DataFrame.corr()

13. DataFrame.add

The add() function used to add a specific number to each value in DataFrame. It works
by iterating through a DataFrame and operating on each item.

DataFra me['A'].add(20)

14. DataFrame. sub

Like the addition function, you can also subtract a number from each value a
DataFrame or s in pecific column:

DataFra me['A'].sub(10)

15. DataFrame. mul

This is a multiplication version of the addition function of pandas:

DataFra me['A'].mul(10)

16. DataFrame. div

We can d ivide each data point in a column or DataFrame by a specific r:

DataFra numbe me['A'].div(2)

17. DataFrame. std

Using the std() function, pandas also lets you compute the standard deviation for each
column in a DataFrame. It works by iterating through each column in a dataset and calculating
the standard deviation for each:

DataFrame.std()

18.DataFrame.melt

31

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

The melt() function in pandas flips the columns in a DataFrame to individual rows. It's
like exposing the anatomy of a DataFrame. So it lets you view the value assigned to each
column explicitly.

newDataFrame = DataFrame.melt()

print(newDataFrame)

19. DataFrame.pop

This function lets you remove a specified column from a pandas DataFrame. It accepts
an item keyword, returns the popped column, and separates it from the rest of the DataFrame:

DataFrame.pop(i tem= 'B')

print(DataFrame)

20.DataFrame.d ropna

The drop na() method removes all rows containing null values:

DataFrame.dro pna(inplace = True)

print(DataFrame)

Result:

Thus the vario us commands in Data Frame are executed successfully

32

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:5 Univariate analysis: Frequency, Mean, Median, Mode,


Variance, Standard Deviation,Skewness and Kurtosis

Aim :
To find Frequency, Mean, Median, Mode, Variance, Standard Deviation,

Skewness and Kurtosis using diabetes dataset.

Read Diabetes
data set:
Import
pandas df = as pd
pd. read_csv('diabetes.c
df.head()sv')
df.shape
OutPut

(768, 9)
df.dtypes
Output:
Pregnancies
Glucose int
BloodPressur 64
e int64
SkinThicknes int64
s Insulin int64
BMI int64
float
Age 64
Outcome DiabetesPedigreeFunction
dtype: object float64
int
df['Outcome'] 64
= df.dtypes[
Output:
dtype('bool')

df.info(
) Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
Pregnancies 768 non-null int64
Glucose 768 non-null int64
BloodPressure 768 non-null int64
SkinThickness 768 non-null int64
Insulin 768 non-null int64
BMI 768 non-null float64
33

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

DiabetesPedigreeFunction 768 non-null float64


Age 768 non-null int64
Outcome 768 non-null bool
dtypes: bool(1), float64(2), int64(6)
memory usage: 48.9 KB

df.describe().T

Pregnency Propagation:
import numpy as np
preg_proportion = np.array(df['Pregnancies'].value_counts()) preg_month =
np.array(df['Pregnancies'].value_counts().index) preg_proportion_perc =
np.array(np.round(preg_proportion/sum(preg_proportion),3)
*100,dtype=int)
preg = pd.DataFrame({'month':
Pregnancies,'count_of_preg_prop':preg_proportion,'
percentage_proportion':preg_proportion_p
erc})
preg.set_inde (['month'],inplace=True)
x
preg.head(10)
s sns
import seaborn ab.pyplot as plt
import matplotli plots(nrows=3,ncols=2,dpi=120,figsize = (8,6))
fig,axes = plt.sub
plot('Pregnancies',data=df,ax=axes[0][0],color='green')
plot00=sns.count le('Count',fontdict={'fontsize':8})
axes[0][0].set_tit abel('Month of Preg.',fontdict={'fontsize':7})
axes[0][0].set_xl abel('Count',fontdict={'fontsize':7})
axes[0][0].set_yl
plt.tight_layout()
plot('Pregnancies',data=df,hue='Outcome',ax=axes[0]
plot01=sns.count [1]) le('Diab. VS Non-Diab.',fontdict={'fontsize':8})
axes[0][1].set_tit abel('Month of Preg.',fontdict={'fontsize':7})
axes[0][1].set_xl abel('Count',fontdict={'fontsize':7})
axes[0][1].set_yl nd(loc=1)
plot01.axes.lege 1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0] 1].get_legend().get_title(), fontsize='6')
[ plt.setp(axes[0]
[ plt.tight_layout
()

plot10 = sns.distplot(df['Pregnancies'],ax=axes[1][0]) axes[1]


[0].set_title('Pregnancies Distribution',fontdict={'fontsize':8})
axes[1][0].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][0].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plt.tight_layout()

plot11 = df[df['Outcome']==False]['Pregnancies'].plot.hist(ax=axes[1][1],label='Non-Diab.')
plot11_2=df[df['Outcome']==True]['Pregnancies'].plot.hist(ax=axes[1][1],label='Diab.')
axes[1][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[1]
[1].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][1].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plot11.axes.legend(loc=1)
plt.setp(axes[1][1].get_legend().get_texts(), fontsize='6') # for legend text
34

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.setp(axes[1][1].get_legend().get_title(), fontsize='6') # for legend title


plt.tight_layout()

plot20 = sns.boxplot(df['Pregnancies'],ax=axes[2][0],orient='v')
axes[2][0].set_title('Pregnancies',fontdict={'fontsize':8}) axes[2]
[0].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[0].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.tight_layout()

plot21 = sns.boxplot(x='Outcome',y='Pregnancies',data=df,ax=axes[2][1])
axes[2][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[2]
[1].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[1].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
plt.tight_layout()
plt.show()

Understanding Distribution
The distribution of Pregnancies in data is unimodal and skewed to the right, centered at
about 1 with most of the data between 0 and 15, A range of roughly 15, and outliers are
present on the higher end.

Glucose Variable
35

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

df.Glucose.describe
() Output:
count
768.000000 mean
120.894531 std
31.972618 min
0.000000
25%
99.000000
50%
117.000000
75%
140.250000
max
199.000000
Name: Glucose, dtype:
float64

#sns.set_style('darkgrid')
fig,axes = plt.subplots(nrows=2,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.
#axes[0][0].
axes[0] distplot(df['Glucose'],ax=axes[0][0],color='green')
[0]. yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
axes[0] set_title('Distribution of
[0]. Glucose',fontdict={'fontsize':8}) set_xlabel('Glucose
axes[0] Class',fontdict={'fontsize':7})
[0]. plt. set_ylabel('Count/Dist.',fontdict={'fontsize':7})
tight_layout()
plot01=sns.
color='gre en',label='Non
distplot(df[df['Outcome']==False]
sns.distplot( ['Glucose'],ax=axes[0][1], Diab.') ,label='
Di ab') df[df.Outcome==True]['Glucose'],ax=axes[0]
axes[0] [1],color='red'
[1].
axes[0] set_title('Distribution of
[1]. Glucose',fontdict={'fontsize':8}) set_xlabel('Glucose
axes[0] Class',fontdict={'fontsize':7})
[1]. set_ylabel('Count/Dist.',fontdict={'fontsize':7})
#axes[0][1].
yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.
legend(loc=1)
plt.setp(axes 0][1].get_legend().get_texts(), fontsize='6')
[ plt.setp(ax 0][1].get_legend().get_title(), fontsize='6')
es[ tight_layout()
plt.
boxplot(df['Glucose'],ax=axes[1][0],orient='v')
plot10=sns set_title('Numerical
. axes[1]
[0].
axes[1]
[0].
axes[1][0].
e':7})
plt
.

plot11=sns.boxplot(x='Outcome',y='Glucose',data=df,ax=axes[1][1])
axes[1][1].set_title(r'Numerical Summary
(Outcome)',fontdict={'fontsize':8}) axes[1][1].set_ylabel(r'Five Point
Summary(Glucose)',fontdict={'fontsize':7})

CS3361_DATA SCIENCE
4931_Grace College of Engineering,
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7) axes[1]
[1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()

plt.show()

36

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Understanding Distribution
The distribution of Glucose level among patients is unimodal and roughly bell shaped,
centered at abo ut 115 with most of the data between 90 and 140, A range of 150, and
outliers are pres roughly ent on the lower end(Glucose ==0).

verify distributio n by keeping only non zero entry of Glucose:


fig,axes = plt.s ubplots(nrows=1,ncols=2,dpi=120,figsize = (8,4))

plot0=sns.dist plot(df[df['Glucose']!=0]['Glucose'],ax=axes[0],color='green')
#axes[0].yaxis.s et_major_formatter(FormatStrFormatter('%.3f'))
axes[0].set_title( 'Distribution of Glucose',fontdict={'fontsize':8})
axes[0].set_xla bel('Glucose Class',fontdict={'fontsize':7})
axes[0].set_yla bel('Count/Dist.',fontdict={'fontsize':7})
plt.tight_layout()

plot1=sns.boxplot(df[df['Glucose']!=0]['Glucose'],ax=axes[1],orient='v')
axes[1].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1].set_xlabel('Glucose',fontdict={'fontsize':7})
axes[1].set_ylabel(r'Five Point Summary(Glucose)',fontdict={'fontsize':7})
plt.tight_layout()

37

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Blood Pr
df. essure variable
counBloodPressure.describ
t e()
mean 768.0000
std 00
min 69.1054
25% 69
50% 19.3558
75% 07
max 0.0000
Name: 00
62.0000
fig,axes = plt00
72.0000
plot00=sns 00
. axes[0]80.0000
[0]. 00
axes[0] 122.0000
[0]. 00
axes[0] BloodPressure, dtype:
[0]. float64
axes[0]
[0]. plt.

plot01=sns.distplot(df[df['Outcome']==False]['BloodPressure'],ax=axes[0]
[1],colo r='green',label='Non Diab.') sns.distplot(df[df.Outcome==True]
['BloodPressure'],ax=axes[0][1],color='red',lab el='Diab')
axes[0][1].set_title('Distribution of BP',fontdict={'fontsize':8})
axes[0][1].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0][1].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
axes[0]
[1].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.legend(loc=1) plt.setp(axes[0]
[1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0][1].get_legend().get_title(),
fontsize='6') plt.tight_layout()

plot10=sns.boxplot(df['BloodPressure'],ax=axes[1][0],orient='v')
38

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

axes[1][0].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1][0].set_xlabel('BP',fontdict={'fontsize':7}) axes[1]
[0].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()

plot11=sns.boxplot(x='Outcome',y='BloodPressure',data=df,ax=axes[1]
[1]) axes[1][1].set_title(r'Numerical Summary
(Outcome)',fontdict={'fontsize':8}) axes[1][1].set_ylabel(r'Five Point
Summary(BP)',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
axes[1][1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()

plt.show()
Understanding Distribution
The distribution of BloodPressure among patients is unimodal (This is not a bimodal
because BP=0 does not make any sense and it is Outlier) and bell shaped, d at about
65 with most of t centere he data between 60 and 90, A range of roughly 100, andre present
on the lower end
outliers a (BP ==0).

5b) Biva riate analysis: Linear and logistic n


modelin regressio g

import
os import
pandas as
import
pd random
import
matplotlib.pyplot as
import
plt seaborn as sns
import numpy
as np

os.chdir(
"C:/Users/Administrator/
df = Desktop/DS")
df.head()
pd.read_csv('diabetes.csv')

plt.ylim(0,
sns.scatterplot(df.DiabetesPedigreeFunction,df.Gl
ucose)

39

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

sns.scatterplot(df.BMI,df.Age)
plt.ylim(0,20000)

sns.scatterplot(df.BloodPressure,df.Glucose)
plt.ylim(0,20000)

plt.figure(figsize=(12,8))
sns.kdeplot(data=df,x=df.Glucose,hue=df.Outcome,fill=True)

40

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

5C) Multiple Regression analysis

df.isnull().values .any()
False

(df.Pregnancies ==
0).sum(),(df.Gl ucose==0).sum(),(df.BloodPressure==0).sum(), .sum(),(df.
Insulin==0).sum((df.SkinThickness==0) Age==0).su
m() ),(df.BMI==0).sum(),(df.DiabetesPedigreeFunction==0).sum(),
(df.
## Counting cell

Output: s with 0 Values for each variable and publishing the counts below
(111, 5, 35,

drop_Glu=df.i 227, 374, 11, 0, 0)


drop_BP=df.inde
drop_Skin = df.index[df.Glucose == 0].tolist()
drop_Ins = df.in x[df.BloodPressure == 0].tolist()
drop_BMI = df.i ndex[df.SkinThickness==0].tolist
c=drop_Glu+drop_BP+drop_Skin+drop_Ins+drop_BMI
dia=df.drop(df.index[c])
dia.info()

Output:
class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 3 to 765
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 392 non-null int64
1 Glucose 392 non-null int64
2 BloodPressure 392 non-null int64
41

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

3 SkinThickness 392 non-null int64


4 Insulin 392 non-null int64
5 BMI 392 non-null float64
6 DiabetesPedigreeFunction 392 non-null float64
7 Age 392 non-null int64
8 Outcome 392 non-null bool
dtypes: bool(1), float64(2), int64(6)
memory usage: 27.9 KB

cor = dia.corr(method ='pearson')


cor

Output

sns.heatmap(cor)

Result:
Thus the univariate ,bivariate, multivariate analysis performed successfully.

42

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:6 Apply and explore various plotting functions


on UCI data sets

Aim:
Apply and explore various plotting functions on UCI data sets.
To load and quickly visualize the Multiple Features Dataset [1] from the UCI repository, which
is available in mvlearn. This dataset can be a good tool for analyzing the effectiveness
of multiview algorithms. It contains 6 viewsof handwritten digit images, thus allowing for
analysis of multiview al in multiclass or unsupervised tasks.
gorithms

a. Normal curves
A probability dis tribution is a statistical function that describes the likelihood of ob taining the
possible values t hat a random variable can take. By this, we mean the range of va lues that a
parameter can ta ke when we randomly pick upvalues from it. If we were asked to up 1 adult
randomly and as pick ked what his/her (assuming gender does not affect height) would be?
There’s no way height heights of
adults in the city, to know what the height will be. But if we have the distribution of lso known
as a Gaussian dis we can bet on the most probable outcome.A Normal Distribution is ably, but
it means the sam a tribution or famously Bell Curve. People use both words
interchange e thing.It is a continuous probability distribution.
Code:
import numpy as
np
import matplotli
b.pyplot as
# Creating a seri
plt es of data
range of 1-50.x =

np.linspace(1,50,200)

#Creating a Function.

def normal_dist(x , mean , sd):

prob_density = (np.pi*sd) * np.exp(-0.5*((x-

mean)/sd)**2)return prob_density

43

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

#Calculate mean and Standard

deviation.mean = np.mean(x)

sd = np.std(x)

#Apply function to the

data. pdf =

normal_dist(x,mean,sd

)#Plotting the Results

plt.plot(x,pdf , co lor

'red') plt.xlabel(' =

points') Data

plt.ylabel('Proba

bility Density')

b. Density and contour plots


Contour plots also called level plots are a tool for doing multivariate analysis and visualizing 3-
D plots in 2-D space.If we consider X and Y as our variables we want to plot then the response
Z will be plotted as slices on the X-Y plane due to which contours are sometimes referred as Z-
slices or iso-response.

Contour plots are widely used to visualize density, altitudes or heights of the mountain as well
as in the meteorological department. Due to such wide usage matplotlib.pyplot provides
a

44

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

method contour to make it easy for us to draw contour plots.

Code:
import

matplotlib.pyplot as plt

import numpy as np

feature_x = np.arange(0, 50, 2)

feature_y =

np.arange(0, 50, 3)#

Creating 2-D gri d of

features

[X, Y] = np.mesh grid(feature_x,

feature_y)fig, ax = plt.subplots(1,

1)

Z = np.cos(X / 2) +

np.sin(Y / 4)# pl ots

contour lines

ax.contour(X, Y, Z)

45

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

ax.set_title('Contou

r Plot')

ax.set_xlabel('featu

re_x')

ax.set_ylabel('fea tu

re_y') plt.show()

c. Correlat ion and scatter plots


Correlation mea ns an association, It is a measure of the extent to which two are related.
variables
1. Positive re positively
correlated. ‘1’ is a Correlation: When two variables increase together and decrease together. They a ated the more
the demand for the pperfect positive correlation. For example – demand and profit are positively
correl roduct, the more profit hence positive correlation.
2. Negative her and vice-
versa. They are negatively correlated. For example, If the distance between magnet increases their
attraction decreases, and vice-versa. Hence, a negative correlation. ‘-1’ is no correlation

3. Zero Correlation( No Correlation): When two variables don’t seem to be linked at all. ‘0’ is a perfect
negative correlation. For Example, the amount of tea you take and level of intelligence.

Code:
import pandas as pd

con =

46

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

pd.read_csv('concrete.csv

')con

list(con.columns)
con.head()

con['cement'] =

con['cement'].astype('category')

con.describe(include='category')

import seaborn as sns

sns.scatterplot(x ="water", y="coarseagg", data=con);

ax = sns.scatterpl ot(x="water",

data=con)ax.set_ y="coarseagg",

title("Concrete Strength vs.

47

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Fly ash") ax.set_xlabel("coarseagg");

sns.lmplot(x="water", y="coarseagg", data=con);

d.Histogra ms:
A histogram is b asically used to represent data provided in a form of some is accurate
method for the groups.It graphical representation of numerical data distribution.It of bar plot
where X-axis re is a type presents the bin ranges while Y-axis gives information cy.
about frequen
Creating
To create a histo
a ole range
of the values in Histogram of the
intervals.Bins ar iables. The
matplotlib.pyplot gram the first step is to create bin of the ranges, then distribute the
wh
Code: to a series of intervals, and count the values which fall into
each e clearly identified as consecutive, non-overlapping intervals of
from matplotlib import pyplot as

pltimport numpy as np

# Creating dataset

a = np.array([22, 87, 5, 43, 56,

73, 55, 54, 11,

20, 51, 5, 79, 31,

48

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

27])

# Creating histogram

fig, ax = plt.subplots(figsize =(10, 7))

ax.hist(a, bins = [0, 25, 50, 75,

100])# Show plot

plt.show()

Code:

import matplotlib.pyplot

pltimport numpy as as np

from matplotlib i mport colors

from matplotlib.ticker import

PercentFormatter# Creating dataset

np.random.seed(23685752)

N_points = 10000

n_bins = 20

# Creating distribution
x = np.random.randn(N_points)

49

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

y = .8 ** x + np.random.randn(10000) +

25# Creating histogram

fig, axs = plt.subplots(1, 1,figsize =(10, 7),tight_layout = True)

axs.hist(x, bins = n_bins)

# Show

plot

plt.show()

e. Three di mensional plotting

Matplotlib was in troduced keeping in mind, only two-dimensional plotting. But at time when
the release of 1. the have 3d
implementation of data available today! The 3d plots are enabled by importing the
mplot3d toolkit. In this article, we will deal with the 3d plots using matplotlib.

Code:

from mpl_toolkits import

mplot3dimport numpy as np

import matplotlib.pyplot as plt

fig = plt.figure()
50

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# syntax for 3-D projection

ax = plt.axes(projection ='3d')

# defining axes

z = np.linspace(0, 1, 100)

x = z * np.sin(25 * z)

y = z * np.cos(25 * z)

c=x+y

ax.scatter( x, y, z, c = c)

# syntax fo r plotting

ax.set_title ('3d Scatter plot')

plt.show()

Result:
Thus the various plots are executed and plotted successfully.

51

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:7 Visualizing Geographic Data with Basemap

Aim:
To Visualizing Geographic Data with Basemap
One common type of visualization in data science is that of geographic data. Matplotlib's main
tool for this type of visualization is the Basemap toolkit, which is one of several Matplotlib
toolkits which li ves under the mpl_toolkits namespace. Admittedly, Basemap feels bit clunky
to use, and often a ight hope.
More modern sol even simple visualizations take much longer to render than you m e for more
intensive map vi utions such as leaflet or the Google Maps API may be a better ve in their
virtual toolbelts. choic sualizations. Still, Basemap is a useful tool for Python userssualization
that is possible w to ha In this section, we'll show several examples of the type of
map vi ith this toolkit.
Installation of B s and
the package will be asemap is straightforward; if you're using conda you can type
thi downloaded:
conda install bas
em
Code: ap

fig = plt.fi

m = Basemgure(figsize=(8,

resol 8))

widt ap(projection='lc

c', ution=None,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x,

y) for plottingx, y =

m(-122.3, 47.6)

plt.plot(x, y, 'ok',

markersize=5)
52

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.text(x, y, ' Seattle',

fontsize=12);

from mpl_t oolkits.basem

import Bas ap emap

import mat plotlib.pyplot

plt fig =
as
plt.figure(figsize =

(12,12))m =

Basemap()

m.drawcoastlines()

m.drawcoastlines(linewidth=1.0,

linestyle='dashed', color='red')
53

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.title("Coastlines", fontsize=20)

plt.show()

import nu mpy as

import pan np das

import mat
as pd
import sea
plotlib.pyplot as plt
import geo
born as sns
import sha
pandas as
from
gpd pefile as
shapely.ge
shp
import Poi

fp = r'Map
ometry
map_df_copy = gpd.read_file(fp)

plt.plot(map_df , markersize=5)

54

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Result :

Thus the program using Basemap was installed and successfully executed c
geographi
visualization.

55

CS3361_DATA SCIENCE

You might also like