0% found this document useful (0 votes)

60 views56 pages

Grace Python Numpy MB

Uploaded by

bharathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views56 pages

Grace Python Numpy MB

Uploaded by

bharathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 56

4931_Grace College of Engineering,

GRACE COLLEGE OF ENGINEERING

MULLAKKADU, THOOTHUKUDI -628005

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING

CS 3361- DATA SCIENCE

LABORATORY II YEAR/III
SEMESTER
REGULATION
2021

LAB
PREPARED BY Mrs.P.JOY

SUGANTHY BAI,

Assistant professor
CSE Department
Grace College of Engineering, Thoothukudi

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

List of Experiments
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandaspackages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands
for doingdescriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing thefollowing:
a. Univaria te analysis: Frequency, Mean, Median, Mode, Variance,
Deviation, Standard
b. BivariatSkewness and Kurtosis.
c. Multiplee analysis: Linear and logistic regression modeling
d. Also co Regression analysis
mpare the results of the above analysis for the two data sets.
6. Apply a
a. Normal nd explore various plotting functions on UCI data
b. Density sets. curves
c. Correlati and contour plots
d. Histogra on and scatter
e. Three di plots ms
mensional plotting

7. Visualiz
ing Geographic Data with Basemap

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:1 Download, install and explore the features of

NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages

Anaconda:

Anaconda is a distribution of the Python and R programming languages for

scientific computing (data science, machine learning applications, large-scale data processing,
predictive analytics, etc.), that aims to simplify package management and deployment.

Jupyter NoteBook:
The Jupy ter Notebook is an open-source web application that allows you t o create and
share documents that contain live code, equations, visualizations, and ext. Its
uses include data narrative
cle t aning and transformation, numerical simulation, deling,
data visualization,statistical
ma mo chine learning, and much more.
NumPy:
NumPy is a Python library used for working with arrays. It also has for working
in domain of line functions ar algebra, fourier transform, and matrices.
SciPy:
SciPy is ciPy stands
for Scientific Py a scientific computation library that uses NumPy underneath. and signal
processing. Like S thon. It provides more utility functions for optimization,
stats NumPy, SciPy is open source so we can use it freely
NumPy, stands f erical array
data. SciPy, stan or Numerical Python, is used for the manipulation of elements of . Both these
packages providenum ds for Scientific Python, is used for numerical computations in
Python
Statsmodels : extended functionality to work with
Statsmo Python. lyze various
statistical models . It includes
various models o es, weighted
least squares, etcdels is a popular library in Python that enables us to estimate and
ana

Pandas
Pandas are really powerful. They provide you with a huge set of important commands and
features
which are used to easily analyze your data. We can use Pandas to perform various tasks like
filtering your data according to certain conditions, or segmenting and segregating the data
according to preference, etc.

Download Anaconda:

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

1. Type “Anaconda Download” in Google Chrome.

2. Scroll the anacon da products website below.Click the 64bit installer.

3. Click the downloaded .exe file and install Anaconda

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

4. Click the Just M e option

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

5. Anaconda Stored in C: \ Users path

6.After installati on open the anaconda navigator

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

6. Launch Jupitor Notebook for typing programs

7. Click “ New”

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

8. Click the Run button to get the output.

9. Sample program s

Result :
Thus the NumPy, SciPy, Jupyter, Statsmodels and Pandas packages are and
installed successf downloaded ully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:2 Working with Numpy arrays

Aim:
To work with Numpy array using Jupyter Notebook.

Program 1:
Python program to demonstrate
# basic array characteristics
import numpy as np

# Creating array object

arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )

# Printing type of arr object

print("Array is of type: ", type(arr))

# Printing array dimensions (axes)

print("No. of d imensions: ", arr.ndim)

# Printing shape of array

print("Shape of array: ", arr.shape)

# Printing size (t otal number of elements) of

print("Size of arr array ay: ", arr.size)

# Printing type of elements in array

print("Array stor es elements of type: ", arr.dtype)

Output:
array is of type:
No. of dimensio ns:
Shape of array: 2
Size of array: 6
Array stores elements of type: int64

Program 2:
# Python program to demonstrate
# array creation techniques
import numpy as np

# Creating array from list with type float

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')

print ("Array created using passed list:\n", a)

# Creating array from tuple

b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)

# Creating a 3X4 array with all zeros

c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)

# Create a constant value array of complex type

d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn arra y initialized with all
"Array ty 6s." pe is complex:\n",
d)
# Create an arra
e = np.random.ray with random
print ("\nA ra values ndom((2,
2))
# Create a seque
ndom array:\n", e)
# from 0 to 30 w
f = np.arange(0, nce of
print ("\nA seq integers ith
steps of 5
# Create a seque 30, 5)
g =uential array with steps of 5:\n", f)
np.linspace( prin
t ("\nA seq nce of 10 values in range 0 to 5
0, 5, 10)
uential array with 10 values
# Reshaping 3 between" "0 and 5:\n", g)
arr = np.array([[
[5, 2,X4 array to 2X2X3 array
[1, 2, 1, 2, 3, 4],

newarr = arr.reshape(2, 2, 3)

print ("\nOriginal array:\n", arr)

print ("Reshaped array:\n", newarr)

# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()

print ("\nOriginal array:\n", arr)

print ("Fattened array:\n", flarr)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:
Array created using passed list:
[[ 1. 2. 4.]
[ 5. 8. 7.]]

Array created using passed tuple:

[1 3 2]

An array initialized with all zeros:

[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

An array initializ ed with all 6s. Array type is complex:

[[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]]

A random array:
[[ 0.46829566 0.67079389]
[ 0.09079849 0. 95410464]]

A sequential arra y with steps of 5:

[ 0 5 10 15 20 2 5]

A sequential array with 10 values between 0 and 5:

[ 0. 0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
3.33333333 3.88888889 4.44444444 5. ]

Original array:
[[1 2 3 4]
[5 2 4 2]
11

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]

[[4 2 1]
[2 0 1]]]

Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]

Program3:
# Python progra m to
# indexing in nu demonstrate
import numpy mpy
as np
# An exemplar a
arr = np.array([[- rray
[4, -0. 1, 2, 0, 4],
[2.6, 5, 6, 0],
[3, -7, 0, 7, 8],
4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
print ("Array with first 2 rows and alternate"
"columns(0 and 2):\n", temp)

# Integer array indexing example

temp = arr[[0, 1, 2, 3], [3, 2, 1, 0]]
print ("\nElements at indices (0, 3), (1, 2), (2, 1),"
"(3, 0):\n", temp)

# boolean array indexing example

cond = arr > 0 # cond is a boolean array
temp = arr[cond]
12

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

print ("\nElements greater than 0:\n", temp)

Output:

Array with first 2 rows and alternatecolumns(0 and 2):

[[-1. 0.]
[ 4. 6.]]

Elements at indices (0, 3), (1, 2), (2, 1),(3, 0):

[ 4. 6. 0. 3.]

Elements greater than 0:

[ 2. 4. 4. 6. 2.6 7. 8. 3. 4. 2.
Program 4: ]

# Python progr
# basic operatio am to
import numpy demonstrate ns
on single array
a = np.array([1, as np

# add 1 to ever 2, 5, 3])

print ("Adding
y
# subtract 3element
print ("S 1 to every element:",
a+1)
# multiply
print ("Mul from each element
ubtracting 3 from each element:", a-3)
# square
print ("S each element by

# modify existing array

a *= 2
print ("Doubled each element of original array:", a)

# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])

print ("\nOriginal array:\n", a)

print ("Transpose of array:\n", a.T)
Output :
13

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Adding 1 to every element: [2 3 6 4]

Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]

Original array:
[[1 2 3]
[3 4 5] [9 6 0]]
Transpose of arr
[[1 3 9] ay:
[2 4 6]
[3 5 0]]

# Python pro
import nump
gram to demonstrate sorting in
a = np.array([[numpy y as np
[3,
[0, - 1, 4, 2],
4, 6],
# sorted array1, 5]])
print ("Array
n
elements in sorted order:\
# sort array ro n", p.sort(a, axis = None))
print ("Row-wise sorted array:\n",
w-wise
np.sort(a, axis = 1))

# specify sort algorithm

print ("Column wise sort by applying merge-sort:\n",
np.sort(a, axis = 0, kind = 'mergesort'))

# Example to show sorting of structured array

# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# Values to be put in array

values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7),
('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]

# Creating array
arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n",
np.sort(arr, order = 'name'))

print ("Array sorted by graduation year and then cgpa:\n",

np.sort(arr, order = ['grad_year', 'cgpa']))
Output:
Array elements i n sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
Column wise sor t by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:

[('Aakash', 2009, 9.0) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Pankaj', 2008, 7.9)]
Array sorted by graduation year and then cgpa:
[('Pankaj', 2008, 7.9) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Aakash', 2009, 9.0)]

Result:
Thus the programs using numpy executed successfully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:3 Working with Pandas Data Frames

Aim:
To Work with Pandas data frames using Jupyter Notebook

Pandas Data Frames:

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.

Why Use Pand as?

Pandas allows us to analyze big data and make conclusions based on statistical ies.

Pandas can clean theor messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Example1:

import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.
DataFrame(mydataset)
print(myvar)
Create Labels
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

What is a DataFrame?

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table

with rows and columns.

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contai ns plain text and is a well know format that can be read by including
Pandas. everyone
import pandas as
pd
df = pd.read_csv
(' C:\Users\New\Desktop\AD8302\data.csv')
print(df.to_strin
g())

Result:
Thus the program using Data Frames were executed successfully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:4 Reading data from text files , Excel and the web and
exploring various commands for doing descriptive analysis
on the Iris data set.

Aim:
To Read data from text files , Excel and the web and exploring various commands for
doing descriptive analysis on the Iris data set.

Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a technique to analyze data using some
visual
Techniques. With this technique, we can get detailed information about the statistical summary
of the data.

Iris Dataset
Iris Dataset is co nsidered as the Hello World for data science. It contains five colu mns namely
– Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering
plant, the researc hers have measured various features of the different iris flowers a nd recorded
them digitally.

Read a web dat ase

t:
import pandas
as pd
# Reading the C
df = pd.read_csv SV
file
# Printing top 5 r ("Iris.csv
df.head() ")

Output: ows

I.Getting Information about the Dataset

1. Shape: the shape parameter to get the shape of the dataset
df.shape

Output:
18

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

(150, 6)
The dataframe contains 6 columns and 150 rows

2. Info(): the columns and their data types

df.info()
Output:

3. Describe():The describe() function applies basic statistical computations on the dat aset like
extreme values, count of data points standard deviation, etc. Any missing value or aN value
is automatically N skipped. describe() function gives a good picture of the n of data
df.describe() distributio

Output:

II.Checking Missing Values

1. Isnull():We will check if our data contains any missing values or not. Missing values can
occur when no information is provided for one or more items or for a whole unit.
df.isnull().sum()
Output:

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

2. Checking Duplicates:
Pandas drop_duplicates() method helps in removing duplicates from the data frame
data = df.drop_duplicates(subset ="class")

Output:

3. Count:
Series. value_counts() function. This function returns a Series counts of
containing unique values.
df.value_counts("class")

Output:

III.Data Visuali zation: We will use Matplotlib and Seaborn library for the data vi sualization.
Matplotlib is ea sy to use and an amazing visualizing library in Python. It is built o n NumPy
arrays and desig ned to work with the broader SciPy stack and consists of several pl ots like
line, bar, scatter, histogram, etc
Seaborn is a library mostly used for statistical plotting in Python. It is built on top of
Matplotlib and provides beautiful default styles and color palettes to make statistical plots
more attractive

Data visualization using Matplot and Seeborn Library:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='class', data=df, )
plt.show()

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:

Relation betwee n variables

Hue: Hue para meter denotes which column decides the kind of color. end() is an
Legend(): A leg area describing the elements of the graph. bbox_to_anchor=[x0,
Bounding Box: y0] will create a bounding box with lower left corner at
position [x0, y0] . The legend will then be placed 'inside' this box and overlapp it ording to
the specified loc acc
Loc:The attribut parameter. lt value of
loc is loc=”best” e Loc in legend() is used to specify the location of the legend.
Defau
(upper left)
Example 1: Co

# importing packmparing Sepal Length and Sepal

import seaborn
import matplotli Width ages
as sns
b.pyplot as plt
sns.scatterplot(x
hue='class', data=df, )

# Placing Legend outside the Figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)

plt.show()

Output:

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

From the above plot, we can infer that –

 Species Setosa has smaller sepal lengths but larger sepal widths.
 Versicolor Speci es lies in the middle of the other two species in terms of sepal length and
width
 Species Virginic a has larger sepal lengths but smaller sepal widths.

Example 2: Co mparing Petal Length and Petal

# importing pack Width ages

import seaborn as sns
import matplotli b.pyplot as plt

sns.scatterplot(x ='petallength',
hue='c y='petalwidth', lass',
data=df, )
# Placing Lege
plt.legend(bbo nd outside the Figure
x_to_anchor=(1, 1),
plt.show() loc=2)

Output:

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

From the above plot, we can infer that –

 Species Setosa has smaller petal lengths and widths.
 Versicolor Speci es lies in the middle of the other two species in terms of petal length and
width
 Species Virginic a has the largest of petal lengths and widths.
Let’s plot all the column’s relationships using a pairplot. It can be used for multivar iate
analysis.
Example:
# importing pac kag
import seaborn es
import matplotli as
sns
b.pyplot as plt
sns.pairplot(df.
hue='cla
drop(['Id'], axis =
1), ss', height=2)
Histograms

Histograms allo for uni as

well as bi-variate analysis.
Example:
import seaborn as sns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(10,10))

axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['sepallength'], bins=7)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['sepalwidth'], bins=5);

axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['petallength'], bins=6);

axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['petalwidth'], bins=6);

Output:

Handling Correlation

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.

data = df.drop_duplicates(subset ="class",)

data.corr(method='pearson')

Output:

Box Plots

We can use boxp lots to see how the categorical value os distributed with other erical
values.
Example:
# importing pack num ages
import seaborn as sns
import matplotli b.pyplot as plt

def graph(y):
sns.boxplot(x ="class", y=y, data=df)

plt.figure(figsize =(10,10))

# Adding the su bplot at the specified

# grid position
plt.subplot(221)
graph('sepallengt h')

plt.subplot(222)
graph('sepalwidth')

plt.subplot(223)
graph('petallength')

plt.subplot(224)
graph('petalwidth')

plt.show()

Output:
25

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Handling
An Outlier is a Outliers e (so-called
normal)objects. data-item/object that deviates significantly from the rest of for outlier
is
detection is refer th They can be caused by measurement or execution errors. The ers, and the
removal process analys red to as outlier mining. There are many ways to detect the ataframe.
Let’s consider the iris dataset and let’s plot the boxplot for the SepalWidthCm column.
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset

df = pd.read_csv('iris_csv.csv')

sns.boxplot(x='sepalwidth', data=df)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Output:

Removing Outli ers

For removing t he outlier, one must follow the same process of removing an entry fr om the
dataset using its exact position in the dataset because in all the above methods of detecting the
outliers end resu lt is the list of all those data items that satisfy the outlier definition according
to the method us ed.
Example: We w ill detect the outliers using IQR and then we will remove them. W e will also
draw the boxplot to see if the outliers are removed or not.
import sklearn
from sklearn.dat asets import
import pandas as load_boston pd
import seaborn a s
import numpy as sn
s
# Load the datas np
df = pd.read_csv
et
# IQR ('iris_csv.csv')
Q1 = np.percentile(df['sepalwidth'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['sepalwidth'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1

print("Old Shape: ", df.shape)

# Upper bound
upper = np.where(df['sepalwidth'] >= (Q3+1.5*IQR))

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# Lower bound
lower = np.where(df['sepalwidth'] <= (Q1-1.5*IQR))

# Removing the Outliers

df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)

print("New Shape: ", df.shape)

sns.boxplot(x='sepalwidth', data=df)

Output:

Ex.No:4b: Various Commands in data frame:

1. pandas.DataFrame

pandas.DataFrame() used to create a DataFrame in pandas. There are two ways to use this
function. You can form a DataFrame column-wise by passing a dictionary into
the pandas.DataFrame() function. Here, each key is a column, while the values are the rows:

import pandas
DataFrame = pandas.DataFrame({"A" : [1, 3, 4], "B": [5, 9, 12]})
print(DataFrame)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

A B
0 1 5
1 3 9
2 4 12

2. Read From and Write to Excel or CSV in pandas

You can read or write to Excel or CSV files with pandas.

import pandas as pd
df = pd.read_csv("iris_csv.csv")
print(df)

3. Get the Mean , Median, and Mode

We can a lso compute the central tendencies of each column in a DataFrame ing
pandas.: us

DataFrame.mean
()
df.median()

df.mode()

4. DataFrame.tr
ansform
pandas' DataFr nction as an
argument. ame.transform() modifies the values of a DataFrame. It accepts a
fu
data = df.transfor
print(data)
5. DataFrame.is m(lambda y: y*3)

This func null s as True:

df.isnull().sum()

sepallength 0
sepalwidth 0
petallength 0
petalwidth 0
class 0
dtype: int64

6. Dataframe.info
29

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

It returns the summary of non-missing values for each column instead:

df.info()

df.describe()

7. DataFrame.loc

loc to used find the elements in a particular index. To view all items in the third row, for
instance:

data=df.loc[2]

print(data)

8. DataFrame.m ax, min

Getting t he maximum and minimum values using pandas is easy:

df.min()

df.max()

9. DataFrame.ast ype

The astype() function changes the data type of a particular column or .

DataFra DataFrame me.astype(str)

10. DataFrame.inse rt

' insert() fun ction used to add a new column to a DataFrame. It accepts three words,
the column nam key

DataFrame.insert(column = 'C', value = [3, 4, 6], loc=0)

print(DataFrame)

11. DataFrame.sum

The sum() function in pandas returns the sum of the values in each column

DataFrame.cumsum()

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

12. Correlation:

Want to find the correlation between integer or float columns? pandas can help you
achieve that using the corr() function.

DataFrame.corr()

13. DataFrame.add

The add() function used to add a specific number to each value in DataFrame. It works
by iterating through a DataFrame and operating on each item.

DataFra me['A'].add(20)

14. DataFrame. sub

Like the addition function, you can also subtract a number from each value a
DataFrame or s in pecific column:

DataFra me['A'].sub(10)

15. DataFrame. mul

This is a multiplication version of the addition function of pandas:

DataFra me['A'].mul(10)

16. DataFrame. div

We can d ivide each data point in a column or DataFrame by a specific r:

DataFra numbe me['A'].div(2)

17. DataFrame. std

Using the std() function, pandas also lets you compute the standard deviation for each
column in a DataFrame. It works by iterating through each column in a dataset and calculating
the standard deviation for each:

DataFrame.std()

18.DataFrame.melt

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

The melt() function in pandas flips the columns in a DataFrame to individual rows. It's
like exposing the anatomy of a DataFrame. So it lets you view the value assigned to each
column explicitly.

newDataFrame = DataFrame.melt()

print(newDataFrame)

19. DataFrame.pop

This function lets you remove a specified column from a pandas DataFrame. It accepts
an item keyword, returns the popped column, and separates it from the rest of the DataFrame:

DataFrame.pop(i tem= 'B')

print(DataFrame)

20.DataFrame.d ropna

The drop na() method removes all rows containing null values:

DataFrame.dro pna(inplace = True)

print(DataFrame)

Result:

Thus the vario us commands in Data Frame are executed successfully

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:5 Univariate analysis: Frequency, Mean, Median, Mode,

Variance, Standard Deviation,Skewness and Kurtosis

Aim :
To find Frequency, Mean, Median, Mode, Variance, Standard Deviation,

Skewness and Kurtosis using diabetes dataset.

Read Diabetes
data set:
Import
pandas df = as pd
pd. read_csv('diabetes.c
df.head()sv')
df.shape
OutPut

(768, 9)
df.dtypes
Output:
Pregnancies
Glucose int
BloodPressur 64
e int64
SkinThicknes int64
s Insulin int64
BMI int64
float
Age 64
Outcome DiabetesPedigreeFunction
dtype: object float64
int
df['Outcome'] 64
= df.dtypes[
Output:
dtype('bool')

df.info(
) Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
Pregnancies 768 non-null int64
Glucose 768 non-null int64
BloodPressure 768 non-null int64
SkinThickness 768 non-null int64
Insulin 768 non-null int64
BMI 768 non-null float64
33

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

DiabetesPedigreeFunction 768 non-null float64

Age 768 non-null int64
Outcome 768 non-null bool
dtypes: bool(1), float64(2), int64(6)
memory usage: 48.9 KB

df.describe().T

Pregnency Propagation:
import numpy as np
preg_proportion = np.array(df['Pregnancies'].value_counts()) preg_month =
np.array(df['Pregnancies'].value_counts().index) preg_proportion_perc =
np.array(np.round(preg_proportion/sum(preg_proportion),3)
*100,dtype=int)
preg = pd.DataFrame({'month':
Pregnancies,'count_of_preg_prop':preg_proportion,'
percentage_proportion':preg_proportion_p
erc})
preg.set_inde (['month'],inplace=True)
x
preg.head(10)
s sns
import seaborn ab.pyplot as plt
import matplotli plots(nrows=3,ncols=2,dpi=120,figsize = (8,6))
fig,axes = plt.sub
plot('Pregnancies',data=df,ax=axes[0][0],color='green')
plot00=sns.count le('Count',fontdict={'fontsize':8})
axes[0][0].set_tit abel('Month of Preg.',fontdict={'fontsize':7})
axes[0][0].set_xl abel('Count',fontdict={'fontsize':7})
axes[0][0].set_yl
plt.tight_layout()
plot('Pregnancies',data=df,hue='Outcome',ax=axes[0]
plot01=sns.count [1]) le('Diab. VS Non-Diab.',fontdict={'fontsize':8})
axes[0][1].set_tit abel('Month of Preg.',fontdict={'fontsize':7})
axes[0][1].set_xl abel('Count',fontdict={'fontsize':7})
axes[0][1].set_yl nd(loc=1)
plot01.axes.lege 1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0] 1].get_legend().get_title(), fontsize='6')
[ plt.setp(axes[0]
[ plt.tight_layout
()

plot10 = sns.distplot(df['Pregnancies'],ax=axes[1][0]) axes[1]

[0].set_title('Pregnancies Distribution',fontdict={'fontsize':8})
axes[1][0].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][0].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plt.tight_layout()

plot11 = df[df['Outcome']==False]['Pregnancies'].plot.hist(ax=axes[1][1],label='Non-Diab.')
plot11_2=df[df['Outcome']==True]['Pregnancies'].plot.hist(ax=axes[1][1],label='Diab.')
axes[1][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[1]
[1].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][1].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plot11.axes.legend(loc=1)
plt.setp(axes[1][1].get_legend().get_texts(), fontsize='6') # for legend text
34

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.setp(axes[1][1].get_legend().get_title(), fontsize='6') # for legend title

plt.tight_layout()

plot20 = sns.boxplot(df['Pregnancies'],ax=axes[2][0],orient='v')
axes[2][0].set_title('Pregnancies',fontdict={'fontsize':8}) axes[2]
[0].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[0].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.tight_layout()

plot21 = sns.boxplot(x='Outcome',y='Pregnancies',data=df,ax=axes[2][1])
axes[2][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8}) axes[2]
[1].set_xlabel('Pregnancy',fontdict={'fontsize':7}) axes[2]
[1].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
plt.tight_layout()
plt.show()

Understanding Distribution
The distribution of Pregnancies in data is unimodal and skewed to the right, centered at
about 1 with most of the data between 0 and 15, A range of roughly 15, and outliers are
present on the higher end.

Glucose Variable
35

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

df.Glucose.describe
() Output:
count
768.000000 mean
120.894531 std
31.972618 min
0.000000
25%
99.000000
50%
117.000000
75%
140.250000
max
199.000000
Name: Glucose, dtype:
float64

#sns.set_style('darkgrid')
fig,axes = plt.subplots(nrows=2,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.
#axes[0][0].
axes[0] distplot(df['Glucose'],ax=axes[0][0],color='green')
[0]. yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
axes[0] set_title('Distribution of
[0]. Glucose',fontdict={'fontsize':8}) set_xlabel('Glucose
axes[0] Class',fontdict={'fontsize':7})
[0]. plt. set_ylabel('Count/Dist.',fontdict={'fontsize':7})
tight_layout()
plot01=sns.
color='gre en',label='Non
distplot(df[df['Outcome']==False]
sns.distplot( ['Glucose'],ax=axes[0][1], Diab.') ,label='
Di ab') df[df.Outcome==True]['Glucose'],ax=axes[0]
axes[0] [1],color='red'
[1].
axes[0] set_title('Distribution of
[1]. Glucose',fontdict={'fontsize':8}) set_xlabel('Glucose
axes[0] Class',fontdict={'fontsize':7})
[1]. set_ylabel('Count/Dist.',fontdict={'fontsize':7})
#axes[0][1].
yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.
legend(loc=1)
plt.setp(axes 0][1].get_legend().get_texts(), fontsize='6')
[ plt.setp(ax 0][1].get_legend().get_title(), fontsize='6')
es[ tight_layout()
plt.
boxplot(df['Glucose'],ax=axes[1][0],orient='v')
plot10=sns set_title('Numerical
. axes[1]
[0].
axes[1]
[0].
axes[1][0].
e':7})
plt
.

plot11=sns.boxplot(x='Outcome',y='Glucose',data=df,ax=axes[1][1])
axes[1][1].set_title(r'Numerical Summary
(Outcome)',fontdict={'fontsize':8}) axes[1][1].set_ylabel(r'Five Point
Summary(Glucose)',fontdict={'fontsize':7})

CS3361_DATA SCIENCE
4931_Grace College of Engineering,
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7) axes[1]
[1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()

plt.show()

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Understanding Distribution
The distribution of Glucose level among patients is unimodal and roughly bell shaped,
centered at abo ut 115 with most of the data between 90 and 140, A range of 150, and
outliers are pres roughly ent on the lower end(Glucose ==0).

verify distributio n by keeping only non zero entry of Glucose:

fig,axes = plt.s ubplots(nrows=1,ncols=2,dpi=120,figsize = (8,4))

plot0=sns.dist plot(df[df['Glucose']!=0]['Glucose'],ax=axes[0],color='green')
#axes[0].yaxis.s et_major_formatter(FormatStrFormatter('%.3f'))
axes[0].set_title( 'Distribution of Glucose',fontdict={'fontsize':8})
axes[0].set_xla bel('Glucose Class',fontdict={'fontsize':7})
axes[0].set_yla bel('Count/Dist.',fontdict={'fontsize':7})
plt.tight_layout()

plot1=sns.boxplot(df[df['Glucose']!=0]['Glucose'],ax=axes[1],orient='v')
axes[1].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1].set_xlabel('Glucose',fontdict={'fontsize':7})
axes[1].set_ylabel(r'Five Point Summary(Glucose)',fontdict={'fontsize':7})
plt.tight_layout()

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Blood Pr
df. essure variable
counBloodPressure.describ
t e()
mean 768.0000
std 00
min 69.1054
25% 69
50% 19.3558
75% 07
max 0.0000
Name: 00
62.0000
fig,axes = plt00
72.0000
plot00=sns 00
. axes[0]80.0000
[0]. 00
axes[0] 122.0000
[0]. 00
axes[0] BloodPressure, dtype:
[0]. float64
axes[0]
[0]. plt.

plot01=sns.distplot(df[df['Outcome']==False]['BloodPressure'],ax=axes[0]
[1],colo r='green',label='Non Diab.') sns.distplot(df[df.Outcome==True]
['BloodPressure'],ax=axes[0][1],color='red',lab el='Diab')
axes[0][1].set_title('Distribution of BP',fontdict={'fontsize':8})
axes[0][1].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0][1].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
axes[0]
[1].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.legend(loc=1) plt.setp(axes[0]
[1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0][1].get_legend().get_title(),
fontsize='6') plt.tight_layout()

plot10=sns.boxplot(df['BloodPressure'],ax=axes[1][0],orient='v')
38

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

axes[1][0].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1][0].set_xlabel('BP',fontdict={'fontsize':7}) axes[1]
[0].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()

plot11=sns.boxplot(x='Outcome',y='BloodPressure',data=df,ax=axes[1]
[1]) axes[1][1].set_title(r'Numerical Summary
(Outcome)',fontdict={'fontsize':8}) axes[1][1].set_ylabel(r'Five Point
Summary(BP)',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
axes[1][1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()

plt.show()
Understanding Distribution
The distribution of BloodPressure among patients is unimodal (This is not a bimodal
because BP=0 does not make any sense and it is Outlier) and bell shaped, d at about
65 with most of t centere he data between 60 and 90, A range of roughly 100, andre present
on the lower end
outliers a (BP ==0).

5b) Biva riate analysis: Linear and logistic n

modelin regressio g

import
os import
pandas as
import
pd random
import
matplotlib.pyplot as
import
plt seaborn as sns
import numpy
as np

os.chdir(
"C:/Users/Administrator/
df = Desktop/DS")
df.head()
pd.read_csv('diabetes.csv')

plt.ylim(0,
sns.scatterplot(df.DiabetesPedigreeFunction,df.Gl
ucose)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

sns.scatterplot(df.BMI,df.Age)
plt.ylim(0,20000)

sns.scatterplot(df.BloodPressure,df.Glucose)
plt.ylim(0,20000)

plt.figure(figsize=(12,8))
sns.kdeplot(data=df,x=df.Glucose,hue=df.Outcome,fill=True)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

5C) Multiple Regression analysis

df.isnull().values .any()
False

(df.Pregnancies ==
0).sum(),(df.Gl ucose==0).sum(),(df.BloodPressure==0).sum(), .sum(),(df.
Insulin==0).sum((df.SkinThickness==0) Age==0).su
m() ),(df.BMI==0).sum(),(df.DiabetesPedigreeFunction==0).sum(),
(df.
## Counting cell

Output: s with 0 Values for each variable and publishing the counts below
(111, 5, 35,

drop_Glu=df.i 227, 374, 11, 0, 0)

drop_BP=df.inde
drop_Skin = df.index[df.Glucose == 0].tolist()
drop_Ins = df.in x[df.BloodPressure == 0].tolist()
drop_BMI = df.i ndex[df.SkinThickness==0].tolist
c=drop_Glu+drop_BP+drop_Skin+drop_Ins+drop_BMI
dia=df.drop(df.index[c])
dia.info()

Output:
class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 3 to 765
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 392 non-null int64
1 Glucose 392 non-null int64
2 BloodPressure 392 non-null int64
41

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

3 SkinThickness 392 non-null int64

4 Insulin 392 non-null int64
5 BMI 392 non-null float64
6 DiabetesPedigreeFunction 392 non-null float64
7 Age 392 non-null int64
8 Outcome 392 non-null bool
dtypes: bool(1), float64(2), int64(6)
memory usage: 27.9 KB

cor = dia.corr(method ='pearson')

cor

Output

sns.heatmap(cor)

Result:
Thus the univariate ,bivariate, multivariate analysis performed successfully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:6 Apply and explore various plotting functions

on UCI data sets

Aim:
Apply and explore various plotting functions on UCI data sets.
To load and quickly visualize the Multiple Features Dataset [1] from the UCI repository, which
is available in mvlearn. This dataset can be a good tool for analyzing the effectiveness
of multiview algorithms. It contains 6 viewsof handwritten digit images, thus allowing for
analysis of multiview al in multiclass or unsupervised tasks.
gorithms

a. Normal curves
A probability dis tribution is a statistical function that describes the likelihood of ob taining the
possible values t hat a random variable can take. By this, we mean the range of va lues that a
parameter can ta ke when we randomly pick upvalues from it. If we were asked to up 1 adult
randomly and as pick ked what his/her (assuming gender does not affect height) would be?
There’s no way height heights of
adults in the city, to know what the height will be. But if we have the distribution of lso known
as a Gaussian dis we can bet on the most probable outcome.A Normal Distribution is ably, but
it means the sam a tribution or famously Bell Curve. People use both words
interchange e thing.It is a continuous probability distribution.
Code:
import numpy as
np
import matplotli
b.pyplot as
# Creating a seri
plt es of data
range of 1-50.x =

np.linspace(1,50,200)

#Creating a Function.

def normal_dist(x , mean , sd):

prob_density = (np.pisd) np.exp(-0.5*((x-

mean)/sd)**2)return prob_density

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

#Calculate mean and Standard

deviation.mean = np.mean(x)

sd = np.std(x)

#Apply function to the

data. pdf =

normal_dist(x,mean,sd

)#Plotting the Results

plt.plot(x,pdf , co lor

'red') plt.xlabel(' =

points') Data

plt.ylabel('Proba

bility Density')

b. Density and contour plots

Contour plots also called level plots are a tool for doing multivariate analysis and visualizing 3-
D plots in 2-D space.If we consider X and Y as our variables we want to plot then the response
Z will be plotted as slices on the X-Y plane due to which contours are sometimes referred as Z-
slices or iso-response.

Contour plots are widely used to visualize density, altitudes or heights of the mountain as well
as in the meteorological department. Due to such wide usage matplotlib.pyplot provides
a

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

method contour to make it easy for us to draw contour plots.

Code:
import

matplotlib.pyplot as plt

import numpy as np

feature_x = np.arange(0, 50, 2)

feature_y =

np.arange(0, 50, 3)#

Creating 2-D gri d of

features

[X, Y] = np.mesh grid(feature_x,

feature_y)fig, ax = plt.subplots(1,

Z = np.cos(X / 2) +

np.sin(Y / 4)# pl ots

contour lines

ax.contour(X, Y, Z)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

ax.set_title('Contou

r Plot')

ax.set_xlabel('featu

re_x')

ax.set_ylabel('fea tu

re_y') plt.show()

c. Correlat ion and scatter plots

Correlation mea ns an association, It is a measure of the extent to which two are related.
variables
1. Positive re positively
correlated. ‘1’ is a Correlation: When two variables increase together and decrease together. They a ated the more
the demand for the pperfect positive correlation. For example – demand and profit are positively
correl roduct, the more profit hence positive correlation.
2. Negative her and vice-
versa. They are negatively correlated. For example, If the distance between magnet increases their
attraction decreases, and vice-versa. Hence, a negative correlation. ‘-1’ is no correlation

3. Zero Correlation( No Correlation): When two variables don’t seem to be linked at all. ‘0’ is a perfect
negative correlation. For Example, the amount of tea you take and level of intelligence.

Code:
import pandas as pd

con =

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

pd.read_csv('concrete.csv

')con

list(con.columns)
con.head()

con['cement'] =

con['cement'].astype('category')

con.describe(include='category')

import seaborn as sns

sns.scatterplot(x ="water", y="coarseagg", data=con);

ax = sns.scatterpl ot(x="water",

data=con)ax.set_ y="coarseagg",

title("Concrete Strength vs.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Fly ash") ax.set_xlabel("coarseagg");

sns.lmplot(x="water", y="coarseagg", data=con);

d.Histogra ms:
A histogram is b asically used to represent data provided in a form of some is accurate
method for the groups.It graphical representation of numerical data distribution.It of bar plot
where X-axis re is a type presents the bin ranges while Y-axis gives information cy.
about frequen
Creating
To create a histo
a ole range
of the values in Histogram of the
intervals.Bins ar iables. The
matplotlib.pyplot gram the first step is to create bin of the ranges, then distribute the
wh
Code: to a series of intervals, and count the values which fall into
each e clearly identified as consecutive, non-overlapping intervals of
from matplotlib import pyplot as

pltimport numpy as np

# Creating dataset

a = np.array([22, 87, 5, 43, 56,

73, 55, 54, 11,

20, 51, 5, 79, 31,

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

27])

# Creating histogram

fig, ax = plt.subplots(figsize =(10, 7))

ax.hist(a, bins = [0, 25, 50, 75,

100])# Show plot

plt.show()

Code:

import matplotlib.pyplot

pltimport numpy as as np

from matplotlib i mport colors

from matplotlib.ticker import

PercentFormatter# Creating dataset

np.random.seed(23685752)

N_points = 10000

n_bins = 20

# Creating distribution
x = np.random.randn(N_points)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

y = .8 ** x + np.random.randn(10000) +

25# Creating histogram

fig, axs = plt.subplots(1, 1,figsize =(10, 7),tight_layout = True)

axs.hist(x, bins = n_bins)

# Show

plot

plt.show()

e. Three di mensional plotting

Matplotlib was in troduced keeping in mind, only two-dimensional plotting. But at time when
the release of 1. the have 3d
implementation of data available today! The 3d plots are enabled by importing the
mplot3d toolkit. In this article, we will deal with the 3d plots using matplotlib.

Code:

from mpl_toolkits import

mplot3dimport numpy as np

import matplotlib.pyplot as plt

fig = plt.figure()
50

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

# syntax for 3-D projection

ax = plt.axes(projection ='3d')

# defining axes

z = np.linspace(0, 1, 100)

x = z * np.sin(25 * z)

y = z * np.cos(25 * z)

c=x+y

ax.scatter( x, y, z, c = c)

# syntax fo r plotting

ax.set_title ('3d Scatter plot')

plt.show()

Result:
Thus the various plots are executed and plotted successfully.

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Ex.No:7 Visualizing Geographic Data with Basemap

Aim:
To Visualizing Geographic Data with Basemap
One common type of visualization in data science is that of geographic data. Matplotlib's main
tool for this type of visualization is the Basemap toolkit, which is one of several Matplotlib
toolkits which li ves under the mpl_toolkits namespace. Admittedly, Basemap feels bit clunky
to use, and often a ight hope.
More modern sol even simple visualizations take much longer to render than you m e for more
intensive map vi utions such as leaflet or the Google Maps API may be a better ve in their
virtual toolbelts. choic sualizations. Still, Basemap is a useful tool for Python userssualization
that is possible w to ha In this section, we'll show several examples of the type of
map vi ith this toolkit.
Installation of B s and
the package will be asemap is straightforward; if you're using conda you can type
thi downloaded:
conda install bas
em
Code: ap

fig = plt.fi

m = Basemgure(figsize=(8,

resol 8))

widt ap(projection='lc

c', ution=None,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x,

y) for plottingx, y =

m(-122.3, 47.6)

plt.plot(x, y, 'ok',

markersize=5)
52

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.text(x, y, ' Seattle',

fontsize=12);

from mpl_t oolkits.basem

import Bas ap emap

import mat plotlib.pyplot

plt fig =
as
plt.figure(figsize =

(12,12))m =

Basemap()

m.drawcoastlines()

m.drawcoastlines(linewidth=1.0,

linestyle='dashed', color='red')
53

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

plt.title("Coastlines", fontsize=20)

plt.show()

import nu mpy as

import pan np das

import mat
as pd
import sea
plotlib.pyplot as plt
import geo
born as sns
import sha
pandas as
from
gpd pefile as
shapely.ge
shp
import Poi

fp = r'Map
ometry
map_df_copy = gpd.read_file(fp)

plt.plot(map_df , markersize=5)

CS3361_DATA SCIENCE
4931_Grace College of Engineering,

Result :

Thus the program using Basemap was installed and successfully executed c
geographi
visualization.

CS3361_DATA SCIENCE

Statistics True or False
100% (1)
Statistics True or False
9 pages
Python For Data Science Extended Ebook PDF
100% (5)
Python For Data Science Extended Ebook PDF
56 pages
LAS 3 Statistics and Probability
No ratings yet
LAS 3 Statistics and Probability
8 pages
Data Science Lab - Ii - Cse - CS3361
No ratings yet
Data Science Lab - Ii - Cse - CS3361
55 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Data Sceince Lab Manual
No ratings yet
Data Sceince Lab Manual
64 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Fds Record
No ratings yet
Fds Record
69 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Fods Final Done
No ratings yet
Fods Final Done
67 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
Practical # 8
No ratings yet
Practical # 8
16 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
FDS Dhana
No ratings yet
FDS Dhana
49 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
Lab Description File
No ratings yet
Lab Description File
11 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Cse-Fds Lab Manual
No ratings yet
Cse-Fds Lab Manual
74 pages
Dslab Manual_merged (1)
No ratings yet
Dslab Manual_merged (1)
59 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
31 pages
PP - Chapter - 8
No ratings yet
PP - Chapter - 8
112 pages
DIP Lab Manual No 02
No ratings yet
DIP Lab Manual No 02
24 pages
CS3361-Data Science Laboratory Manual
No ratings yet
CS3361-Data Science Laboratory Manual
58 pages
Python Week+1 New
No ratings yet
Python Week+1 New
44 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
NumPy: Beginner's Guide - Third Edition - Sample Chapter
75% (4)
NumPy: Beginner's Guide - Third Edition - Sample Chapter
54 pages
Fds Lab Manual PDF
No ratings yet
Fds Lab Manual PDF
80 pages
Fds Lab 1-3 Exp
No ratings yet
Fds Lab 1-3 Exp
18 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Data Science Lab
No ratings yet
Data Science Lab
61 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
FDS Ex No 1
No ratings yet
FDS Ex No 1
6 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
53 pages
Fods Lab
No ratings yet
Fods Lab
54 pages
Advance Python Programming
No ratings yet
Advance Python Programming
46 pages
FDS Lab Manual R21
No ratings yet
FDS Lab Manual R21
47 pages
Numpy
No ratings yet
Numpy
37 pages
Python File Semester-4
No ratings yet
Python File Semester-4
42 pages
NumPy Essentials - Sample Chapter
50% (2)
NumPy Essentials - Sample Chapter
16 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Python For Sciences and Engineering
100% (2)
Python For Sciences and Engineering
89 pages
Scipy, Matplotlib, Pandas
No ratings yet
Scipy, Matplotlib, Pandas
16 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Advance Python Unit - 2
100% (1)
Advance Python Unit - 2
97 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
cs3361 Data Science Lab Record Manual
89% (9)
cs3361 Data Science Lab Record Manual
92 pages
ML Practice Session 1
No ratings yet
ML Practice Session 1
8 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
Fdsa Manual
No ratings yet
Fdsa Manual
53 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/52
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/52
12 pages
It Is A Part
No ratings yet
It Is A Part
14 pages
Wachemo University: Hydraulic and WR Engineering Department (HE-4224) 2014
No ratings yet
Wachemo University: Hydraulic and WR Engineering Department (HE-4224) 2014
140 pages
Topics List (Year 12 Maths Phys) + Full Chem
No ratings yet
Topics List (Year 12 Maths Phys) + Full Chem
17 pages
TSA Using Matlab
No ratings yet
TSA Using Matlab
30 pages
Psychological Statistics II LAB Psychological Statistics
No ratings yet
Psychological Statistics II LAB Psychological Statistics
2 pages
Mostwinningabtestresultsareillusory 0
No ratings yet
Mostwinningabtestresultsareillusory 0
16 pages
2018 - Kalsoom - Fiscal Policy For Inclusive Growth
No ratings yet
2018 - Kalsoom - Fiscal Policy For Inclusive Growth
27 pages
Test 1 Stats
No ratings yet
Test 1 Stats
7 pages
Python For Data Science Nympy and Pandas
No ratings yet
Python For Data Science Nympy and Pandas
4 pages
Notes STA408 - Chapter 3
No ratings yet
Notes STA408 - Chapter 3
17 pages
SJMPS 36a511 519
No ratings yet
SJMPS 36a511 519
9 pages
Module 6 - Central Limit Theorem
No ratings yet
Module 6 - Central Limit Theorem
6 pages
Formula Booklet For Basics of Statistics 1
No ratings yet
Formula Booklet For Basics of Statistics 1
2 pages
Understandable Statistics 12th Edition Brase C.H. download
No ratings yet
Understandable Statistics 12th Edition Brase C.H. download
130 pages
A Reflection On Test Automation
No ratings yet
A Reflection On Test Automation
30 pages
T Roshavelov Skopie 2014 PDF
No ratings yet
T Roshavelov Skopie 2014 PDF
41 pages
Review MT2
No ratings yet
Review MT2
33 pages
Douglas C. Montgomery - Supplemental Text Material For Design and Analysis of Experiments (2019)
No ratings yet
Douglas C. Montgomery - Supplemental Text Material For Design and Analysis of Experiments (2019)
179 pages
Report Res511 Ap2253g
No ratings yet
Report Res511 Ap2253g
52 pages
Unit Two Review - SPR 18 - KEY
No ratings yet
Unit Two Review - SPR 18 - KEY
7 pages
Exploring The Impact of Noise On Hybrid Inversion of PROSAIL RTM On Sentinel-2 Data
No ratings yet
Exploring The Impact of Noise On Hybrid Inversion of PROSAIL RTM On Sentinel-2 Data
20 pages
BMS - DSC - Statistics For Business Decision
No ratings yet
BMS - DSC - Statistics For Business Decision
2 pages
Statistics 12th Edition by James T McClave
No ratings yet
Statistics 12th Edition by James T McClave
306 pages
Withers Et Al 1998 A Comparison of Select Trigger Algorithms For Automated JGlobal Seismic Phase and Event Detection
No ratings yet
Withers Et Al 1998 A Comparison of Select Trigger Algorithms For Automated JGlobal Seismic Phase and Event Detection
12 pages
Biostatistics and Epidemiology A Primer For Health and Biomedical Professionals 4th Edition by Sylvia Wassertheil Smoller 9781493921348 1493921347 Instant Download
100% (1)
Biostatistics and Epidemiology A Primer For Health and Biomedical Professionals 4th Edition by Sylvia Wassertheil Smoller 9781493921348 1493921347 Instant Download
65 pages
Group 4 - Normal Distributions
No ratings yet
Group 4 - Normal Distributions
15 pages
Stat&Prob JAN 20 22 25
No ratings yet
Stat&Prob JAN 20 22 25
3 pages