0% found this document useful (0 votes)
11 views51 pages

FDS Lab Manual-1

Fods lab

Uploaded by

anbesivam548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views51 pages

FDS Lab Manual-1

Fods lab

Uploaded by

anbesivam548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Ex.

No: 1 Download,installandexplorethefeaturesof
DATE : NumPy,SciPy,Jupyter,StatsmodelsandPandas

How to Install Python library Functions

AIM:
To download install and explore the features of NumPy, SciPy, Jupiter, Stats model, and
Pandas packages in Python.

ALGORITHM:

STEP 1: Internet must be connected to install packages such as NumPy, SciPy, Jupyter, stat model,
and pandas packages.
STEP2: Open the Run window, type cmd and press enter to open the cmd.
STEP 3:Change the location to Python using the change directory as cmd.
STEP 4: Changing the location as type C:\Users\students\Appdata\Localprograms\Python\python
36-32 Scripts.
STEP 5: After setting the path, check Python and pip versions. Install the NumPy packages using pip
install NumPy.
STEP6: To install SciPy packages using pip install Scipy.
STEP 7: To install the stats model by using the enter pip install stats model
STEP 8: To install Pandas packages, enter pip install pandas.

To check Python &pip versions

Python –v
Pip --v.

1
2
RESULT:

Thus various features of python libraries has been install successfully .

3
Ex. No 2
DATE: Working with Numpy arrays
Aim

To implement array object using Numpy module in Python programming


Algorithm

Step 1: Start the program

Step 2: Import the required packages

Step 3: Read the elements through list/tuple/dictionary

Step 4: Convert List/tuple/dictionary into array using built-in methods


Step 5: Check the number of dimensions in an array

Step 6: Compute the shape of an array or if it’s required reshape an array


Step 7: Do the required operations like slicing, iterating, searching, concatenating
and splitting an array element.

Step 8: Stop the program


(i) Create a NumPy ndarray Object
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
Output

[1 2 3 4 5]

<class 'numpy.ndarray'>
(ii) Dimensions in Arrays
0-D Arrays
Program

import numpy as np
arr = np.array(42)
print(arr)

Output
42
4
1-D Arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Output
[1 2 3 4 5]

2-D Arrays
Program
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
Output
[[1 2 3]
[4 5 6]]

3-D arrays
Program

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Output
[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
(iii) Check Number of Dimensions?
Program

import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
5
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
Output
0
1
2
3

(iv) Access Array Elements


Program

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
Output

Program

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])
Output

(v) Slicing arrays


Program

6
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6,
7]) print(arr[1:5])
Output

[2 3 4 5]

(vi) NumPy Array Shape


Program
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
Output
(2, 4)

(vii) Reshaping
arrays Program
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)
print(newarr)
Output

[[ 1 2 3]

[ 4 5 6]
[ 7 8 9]

[10 11 12]]

(viii) Iterating Arrays


Program

import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)

7
Output

1
2

3
(ix) Joining NumPy Arrays
Program

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1,
arr2)) print(arr)
Output

[1 2 3 4 5 6]

(x) Splitting NumPy Arrays


Program

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])


newarr = np.array_split(arr, 3)
print(newarr)

Output
[array([1, 2]), array([3, 4]), array([5, 6])]

(xi) Searching Arrays


Program

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)

8
Output
(array([3, 5, 6]),)

(xii) Sorting Arrays


Program

import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
Output

[0 1 2 3]

RESULT:

Thus Array object has been explored using Numpy module in Python programming
successfully.

9
Exp. No. 3.

DATE: Working with Pandas data


frames

Aim:
To work with DataFrame object using Pandas module in Python Programming

Algorithm:

Step 1: Start the program

Step 2: Import the required packages

Step 3: Create a DataFrame using built in method.

Step 4: Load data into a DataFrame object otherwise Load Files(excel/csv) into a

DataFrame

Step 5: Display the rows and describe the data set using built in method.

Step 6: Display the last 5 rows of the DataFrame.

Step 7: Check the number of maximum returned rows


Step 8: Stop the program

(i) Create a simple Pandas DataFrame:


Program
import pandas as pd
data = {

"calories": [420, 380, 390],


"duration": [50, 40, 45]

#load data into a DataFrame


object: df = pd.DataFrame(data)
print(df)
Output

calories duration
10
0 420 50
1 380
2 390

(ii) Locate Row


Program
print(df.loc[0])

Output

calories 420

duration 50

Name: 0, dtype: int64

Note: This example returns a Pandas Series.

(iv )use a list of indexes:

Program

print(df.loc[[0, 1]])

Output

calories duration

0 420 50
1 380 40

Note: When using [], the result is a Pandas DataFrame.

(v) Named Indexes


Program
import pandas as pd
data = {

"calories": [420, 380, 390],

11
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])


print(df)

Output

calories duration

day1 420 50
day2 380 40

day3 390 45

(vi) Locate Named Indexes

print(df.loc["day2"])

Output

calories 380

duration 40

Name: 0, dtype: int64

(vii) Load Files Into a DataFrame


Program
import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Output

Duration Pulse Maxpulse Calories

0 60 110 130 409.1

1 60 117 145 479.0

12
2 60 103 135 340.0

3 45 109 175 282.4

4 45 117 148 406.0

.. ... ... ...

164 60 105 140


[169 rows x 4 columns]
165 60 110 145

166 60 115 145


(viii) Check the number of maximum returned
167 75 120 150
rows: Program
168 75 125 150

import pandas as pd

print(pd.options.display.max_rows)

In my system the number is 60, which means that if the DataFrame contains
more than 60 rows, the print(df) statement will return only the headers and the
first and last 5 rows.

import pandas as pd
pd.options.display.max_rows =
9999 df = pd.read_csv('data.csv')
print(df)

(ix) Viewing the Data


Program

import pandas as pd
df = pd.read_csv('data.csv')

print(df.head(4))

13
Output

Duration Pulse Maxpulse Calories

0 60 110 130

1 60 117 145

2 60 103 135
(x) Print the last 5 rows of the DataFrame:
3 45 109 175
print(df.tail())
4 45 117 148
print(df.info())
Output

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 169 entries, 0 to 168

Data columns (total 4 columns):

# Column Non-Null Count Dtype

0 Duration 169 non-null int64

21 Pulse
Maxpulse 169non-null
169 non-null int64
3 Calories 164 non-null float64

dtypes: float64(1), int64(3)

memory usage: 5.4 KB


None

RESULT:

Thus DataFrame object using Pandas module in Python Programming has been
successfully explored

14
Exp. No. 4.
DATE: Reading data from text files, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris
data set.
Aim:

To perform descriptive analytics on Iris dataset using Python programming

Algorithm
Step 1: Start the program

Step 2: Import the required packages

Step 3: Load Files(excel/csv/ text) into a DataFrame from Iris data set

Step 4: Display the rows and describe the data set using built in

methods Step 5: Compare Petal Length and Petal Width

Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers
Step 8: Stop the program

Program
import pandas as pd

# Reading the CSV file

df =
pd.read_csv("Iris.csv") #
Printing top 5 rows
df.head()
Output:

15
Getting Information about the Dataset

df.shape
Output:

(150, 6)

df.info()
Output

df.describe()

Checking Missing Values

df.isnull().sum()

16
Checking Duplicates

data = df.drop_duplicates(subset ="Species",)


data

Output

df.value_counts("Species")

Data Visualization

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()

17
Comparing Sepal Length and Sepal Width

# importing packages
import seaborn as sns

import matplotlib.pyplot as plt


sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',
hue='Species', data=df,

) # Placing Legend outside the Figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)


plt.show()

18
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(df.drop(['Id'], axis = 1),
hue='Species', height=2)

Output:

19
Histograms

Program

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'], bins=5)
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'], bins=6)
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'], bins=6)

Output:

20
Histograms with Distplot Plot

Program

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

plot = sns.FacetGrid(df, hue="Species")


plot.map(sns.distplot, "SepalLengthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "SepalWidthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "PetalLengthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "PetalWidthCm").add_legend()
plt.show()

Output:

21
22
Handling Correlation

data.corr(method='pearson')

Output:

Heatmaps

Program

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);
plt.show()

Output:

23
Box Plots
Program

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
def graph(y):
sns.boxplot(x="Species", y=y, data=df)
plt.figure(figsize=(10,10))
# Adding the subplot at the specified
# grid position
plt.subplot(221)
graph('SepalLengthCm')
plt.subplot(222)
graph('SepalWidthCm')
plt.subplot(223)
graph('PetalLengthCm')
plt.subplot(224)
graph('PetalWidthCm')
plt.show()

Output:

24
Program

# importing packages
import seaborn as sns
import matplotlib.pyplot as
plt # Load the dataset
df = pd.read_csv('Iris.csv')
sns.boxplot(x='SepalWidthCm', data=df)

Output:

25
Removing Outliers
Program
# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns
# Load the dataset
df = pd.read_csv('Iris.csv')
# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')
Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ",
df.shape) # Upper bound
upper = np.where(df['SepalWidthCm'] >=
(Q3+1.5*IQR)) # Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))

26
# Removing the Outliers
df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)
sns.boxplot(x='SepalWidthCm', data=df)

Output:

RESULT:

Thus Iris dataset has been explored and descriptively analysed using Python
programming

27
Exp. No. 5.
DATE: Use the diabetes data set from UCI and Pima Indians Diabetes
data set for performing the following:
Aim:

To perform various exploratory data analysis on Pima Indians Diabetes dataset


using Python Programming
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data
sets. Algorithm

Step 1: Start the program

Step 2: Import the required packages

Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI and Pima Indians
Diabetes data set

Step 4: Display the rows and describe the data set using built in methods

Step 5: Compute Frequency, Mean, Median, Mode, Variance, Standard Deviation,

Skewness and Kurtosis

Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers using built in method
Step 8: Stop the program
Program

import pandas as pd
import seaborn as sns

import matplotlib.pyplot as plt


%matplotlib inline

from sklearn.linear_model import LogisticRegression

from sklearn.externals import joblib


df = pd.read_csv('C:/Users/praveen/Downloads/diabetes.csv')
28
count =
df['Glucose'].value_counts()

display(count)

df.head()

df.describe()

df.mean()

df.mode()

29
df.var()

df.std()

df.skew()

Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64

df.kurtosis()

Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072

30
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64

corr = df.corr()

sns.heatmap(corr,
xticklabels=corr.columns,

yticklabels=corr.columns)

sns.countplot('Outcome', data=df)
plt.show()

31
# Computing the %age of diabetic and non-diabetic in the sample
Out0=len([df.Outcome==1])

Out1=len([df.Outcome==0])
Total=Out0+Out1
PC_of_1 = Out1*100/Total

PC_of_0 = Out0*100/Total
PC_of_1, PC_of_0
(50.0, 50.0)

plt.figure(dpi = 120,figsize= (5,4))


mask = np.triu(np.ones_like(df.corr(),dtype = bool))

sns.heatmap(df.corr(),mask = mask, fmt = ".2f",annot=True,lw=1,cmap = 'plasma')


plt.yticks(rotation = 0)

plt.xticks(rotation = 90)
plt.title('Correlation Heatmap')
plt.show()

32
RESULT:

Thus various exploratory data analysis has been performed on Pima Indians
Diabetes dataset using Python Programming successfully.

33
Exp. No. 6. Apply and explore various plotting functions on UCI data sets.
DATE:
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional
plotting Aim:

To apply various plotting functions on UCI data set using Python Programming
Algorithm

Step 1: Start the program

Step 2: Import the required packages

Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI data set

Step 4: Describe the data set using built in method


Step 5: Compute Frequency, Mean, Median, Mode, Variance, Standard Deviation,

Step 6: Visualize the data set using Explore various plotting functions on UCI data
sets for the following

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots


d. Histograms

e. Three-dimensional plotting

Step 7: Analyse the sample data and do the required operations


Step 8: Stop the program
a. Normal
curves Program
import pandas as pd

import matplotlib.pyplot as plt


import numpy as np
df=pd.read_csv("C:/Users/praveen/Downloads/dataset_diabetes/diabetic_data.cs
v")
34
df.head()
mean =df['time_in_hospital'].mean()

std =df['time_in_hospital'].std()

x_axis = np.arange(1, 10, 0.01)

plt.plot(x_axis, norm.pdf(x_axis, mean, std))


plt.show()

Output

b. Density and contour plots


Program
df.time_in_hospital.plot.density(color='green')
plt.title('Density plot for time_in_hospital')
plt.show()

35
df.num_lab_procedures.plot.density(color='green')

plt.title('Density Plot for num_lab_procedures')

plt.show()

df.num_medications.plot.density(color='green')

plt.title('Density Plot for num_medications')

plt.show()

36
Output

Program

# for 'tip' attribute


# using plot.kde()
df.number_emergency.plot.kde(color='green')
plt.title('KDE-Density plot for
number_emergency') plt.show()

37
Output

Program

def func(x, y):


return np.sin(x) ** 2 + np.cos(y) **2

# generate 50 values b/w 0 a5

mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()
x = np.linspace(0,
mean) y =
np.linspace(0, std)

# Generate combination of
grids X, Y = np.meshgrid(x, y)
Z = func(X, Y)

# Draw rectangular contour plot


plt.contour(X, Y, Z, cmap='gist_rainbow_r');

38
Output

c. Correlation and scatter plots


Program
mp.figure(figsize=(20,10))

dataplot = sb.heatmap(data.corr(), cmap="YlGnBu", annot=True)


Output

39
d. Histograms
Program

df.hist(figsize=(12,12),layout=(5,3))
Output

# plotting histogram for carat using distplot()


sb.distplot(a=df.num_lab_procedures, kde=False)
# visualizing plot using matplotlib.pyplot library
plt.show()

40
Output

e. Three dimensional
plotting Program

fig = plt.figure()
ax = plt.axes(projection = '3d')

x = df['number_emergency']

x = pd.Series(x, name= '')

y=

df['number_inpatient'] y

= pd.Series(x, name= '')

z = df['number_outpatient']
z = pd.Series(x, name= '')
ax.plot3D(x, y, z, 'green')
ax.set_title('3D line plot diabetes dataset')
plt.show()

41
Output

RESULT:

Thus apply various plotting functions on UCI data set using Python Programming

42
Exp. No. 7.

DATE: Visualizing Geographic Data with Basemap


Aim:
To visualize Geographic Data using BaseMap module in Python Programming
Algorithm:

Step 1: Start the program

Step 2: Import the required packages

Step 3: Visualize Geographic Data with Basemap

Step 4: Display the Base map using built in method like basemap along with latitude
and longitude parameters

Step 5: Display the Coastal lines meters and Country boundaries using built in
methods

Step 6: Fill the Coastal lines meters and Country boundaries with suitable colours
Step 7: Create a global map with a Cylindrical Equidistant Projection, Orthographic
Projection, Robinson Projection

Step 8: Stop the program


Create a global map with a Ortho Projection
Program
%matplotlib inline
import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.basemap import Basemap


plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)


m.bluemarble(scale=0.5);

43
Output

Program

fig = plt.figure(figsize=(8, 8))


m = Basemap(projection='lcc', resolution=None,

width=8E6, height=8E6,

lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x, y) for plotting


x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)

plt.text(x, y, ' INDIA', fontsize=12);

44
Output

-
Create a global map with a Coastlines
Program

fig = plt.figure(figsize = (12,12))


m = Basemap()
m.drawcoastlines()
plt.title("Coastlines", fontsize=20)
plt.show()

45
Program

fig = plt.figure(figsize = (12,12))


m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='dashed', color='red')

plt.title("Coastlines", fontsize=20)
plt.show()

46
Create a global map with a Country boundaries
Program

fig = plt.figure(figsize = (12,12))


m = Basemap()

m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')


m.drawcountries()
plt.title("Country boundaries", fontsize=20)

x, y = m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5)

plt.text(x, y, ' INDIA', fontsize=12);

plt.show()

Output

Create a global map with a Mercator Projection


Program

fig = plt.figure(figsize = (10,8))

m = Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180)

m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')

47
m.drawcountries(linewidth=1, linestyle='solid', color='k' )

m.drawmapboundary(fill_color='lightblue')

plt.title("Mercator Projection", fontsize=20)

Output

Create a global map with a Cylindrical Equidistant Projection


Program

fig = plt.figure(figsize = (10,8))


m = Basemap(projection='cyl',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title(" Cylindrical Equidistant Projection", fontsize=20)

48
Output

Create a global map with Orthographic Projection


Program
fig = plt.figure(figsize = (10,8))
m = Basemap(projection='ortho', lon_0 = 25, lat_0 = 10)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title("Orthographic Projection", fontsize=18)

49
Output

Create a global map with a Robinson Projection


Program

fig = plt.figure(figsize = (10,8))


m = Basemap(projection='robin',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180,
lon_0 = 0, lat_0 = 0)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title(" Robinson Projection", fontsize=20)

50
Output

RESULT
Thus Geographic Data has been visualized using BaseMap module in Python
Programming successfully.
51

You might also like