FDS Lab Manual-1
FDS Lab Manual-1
No: 1 Download,installandexplorethefeaturesof
DATE : NumPy,SciPy,Jupyter,StatsmodelsandPandas
AIM:
To download install and explore the features of NumPy, SciPy, Jupiter, Stats model, and
Pandas packages in Python.
ALGORITHM:
STEP 1: Internet must be connected to install packages such as NumPy, SciPy, Jupyter, stat model,
and pandas packages.
STEP2: Open the Run window, type cmd and press enter to open the cmd.
STEP 3:Change the location to Python using the change directory as cmd.
STEP 4: Changing the location as type C:\Users\students\Appdata\Localprograms\Python\python
36-32 Scripts.
STEP 5: After setting the path, check Python and pip versions. Install the NumPy packages using pip
install NumPy.
STEP6: To install SciPy packages using pip install Scipy.
STEP 7: To install the stats model by using the enter pip install stats model
STEP 8: To install Pandas packages, enter pip install pandas.
Python –v
Pip --v.
1
2
RESULT:
3
Ex. No 2
DATE: Working with Numpy arrays
Aim
[1 2 3 4 5]
<class 'numpy.ndarray'>
(ii) Dimensions in Arrays
0-D Arrays
Program
import numpy as np
arr = np.array(42)
print(arr)
Output
42
4
1-D Arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Output
[1 2 3 4 5]
2-D Arrays
Program
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
Output
[[1 2 3]
[4 5 6]]
3-D arrays
Program
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Output
[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
(iii) Check Number of Dimensions?
Program
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
5
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
Output
0
1
2
3
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
Output
Program
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])
Output
6
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6,
7]) print(arr[1:5])
Output
[2 3 4 5]
(vii) Reshaping
arrays Program
import numpy as np
newarr = arr.reshape(4, 3)
print(newarr)
Output
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)
7
Output
1
2
3
(ix) Joining NumPy Arrays
Program
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1,
arr2)) print(arr)
Output
[1 2 3 4 5 6]
import numpy as np
Output
[array([1, 2]), array([3, 4]), array([5, 6])]
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
8
Output
(array([3, 5, 6]),)
import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
Output
[0 1 2 3]
RESULT:
Thus Array object has been explored using Numpy module in Python programming
successfully.
9
Exp. No. 3.
Aim:
To work with DataFrame object using Pandas module in Python Programming
Algorithm:
Step 4: Load data into a DataFrame object otherwise Load Files(excel/csv) into a
DataFrame
Step 5: Display the rows and describe the data set using built in method.
calories duration
10
0 420 50
1 380
2 390
Output
calories 420
duration 50
Program
print(df.loc[[0, 1]])
Output
calories duration
0 420 50
1 380 40
11
"duration": [50, 40, 45]
}
Output
calories duration
day1 420 50
day2 380 40
day3 390 45
print(df.loc["day2"])
Output
calories 380
duration 40
df = pd.read_csv('data.csv')
print(df)
Output
12
2 60 103 135 340.0
import pandas as pd
print(pd.options.display.max_rows)
In my system the number is 60, which means that if the DataFrame contains
more than 60 rows, the print(df) statement will return only the headers and the
first and last 5 rows.
import pandas as pd
pd.options.display.max_rows =
9999 df = pd.read_csv('data.csv')
print(df)
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(4))
13
Output
0 60 110 130
1 60 117 145
2 60 103 135
(x) Print the last 5 rows of the DataFrame:
3 45 109 175
print(df.tail())
4 45 117 148
print(df.info())
Output
<class 'pandas.core.frame.DataFrame'>
21 Pulse
Maxpulse 169non-null
169 non-null int64
3 Calories 164 non-null float64
RESULT:
Thus DataFrame object using Pandas module in Python Programming has been
successfully explored
14
Exp. No. 4.
DATE: Reading data from text files, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris
data set.
Aim:
Algorithm
Step 1: Start the program
Step 3: Load Files(excel/csv/ text) into a DataFrame from Iris data set
Step 4: Display the rows and describe the data set using built in
Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers
Step 8: Stop the program
Program
import pandas as pd
df =
pd.read_csv("Iris.csv") #
Printing top 5 rows
df.head()
Output:
15
Getting Information about the Dataset
df.shape
Output:
(150, 6)
df.info()
Output
df.describe()
df.isnull().sum()
16
Checking Duplicates
Output
df.value_counts("Species")
Data Visualization
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()
17
Comparing Sepal Length and Sepal Width
# importing packages
import seaborn as sns
18
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(df.drop(['Id'], axis = 1),
hue='Species', height=2)
Output:
19
Histograms
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'], bins=5)
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'], bins=6)
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'], bins=6)
Output:
20
Histograms with Distplot Plot
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
Output:
21
22
Handling Correlation
data.corr(method='pearson')
Output:
Heatmaps
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);
plt.show()
Output:
23
Box Plots
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
def graph(y):
sns.boxplot(x="Species", y=y, data=df)
plt.figure(figsize=(10,10))
# Adding the subplot at the specified
# grid position
plt.subplot(221)
graph('SepalLengthCm')
plt.subplot(222)
graph('SepalWidthCm')
plt.subplot(223)
graph('PetalLengthCm')
plt.subplot(224)
graph('PetalWidthCm')
plt.show()
Output:
24
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as
plt # Load the dataset
df = pd.read_csv('Iris.csv')
sns.boxplot(x='SepalWidthCm', data=df)
Output:
25
Removing Outliers
Program
# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns
# Load the dataset
df = pd.read_csv('Iris.csv')
# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')
Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ",
df.shape) # Upper bound
upper = np.where(df['SepalWidthCm'] >=
(Q3+1.5*IQR)) # Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))
26
# Removing the Outliers
df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)
sns.boxplot(x='SepalWidthCm', data=df)
Output:
RESULT:
Thus Iris dataset has been explored and descriptively analysed using Python
programming
27
Exp. No. 5.
DATE: Use the diabetes data set from UCI and Pima Indians Diabetes
data set for performing the following:
Aim:
Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI and Pima Indians
Diabetes data set
Step 4: Display the rows and describe the data set using built in methods
Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers using built in method
Step 8: Stop the program
Program
import pandas as pd
import seaborn as sns
display(count)
df.head()
df.describe()
df.mean()
df.mode()
29
df.var()
df.std()
df.skew()
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
df.kurtosis()
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
30
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64
corr = df.corr()
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)
sns.countplot('Outcome', data=df)
plt.show()
31
# Computing the %age of diabetic and non-diabetic in the sample
Out0=len([df.Outcome==1])
Out1=len([df.Outcome==0])
Total=Out0+Out1
PC_of_1 = Out1*100/Total
PC_of_0 = Out0*100/Total
PC_of_1, PC_of_0
(50.0, 50.0)
plt.xticks(rotation = 90)
plt.title('Correlation Heatmap')
plt.show()
32
RESULT:
Thus various exploratory data analysis has been performed on Pima Indians
Diabetes dataset using Python Programming successfully.
33
Exp. No. 6. Apply and explore various plotting functions on UCI data sets.
DATE:
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional
plotting Aim:
To apply various plotting functions on UCI data set using Python Programming
Algorithm
Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI data set
Step 6: Visualize the data set using Explore various plotting functions on UCI data
sets for the following
a. Normal curves
e. Three-dimensional plotting
std =df['time_in_hospital'].std()
Output
35
df.num_lab_procedures.plot.density(color='green')
plt.show()
df.num_medications.plot.density(color='green')
plt.show()
36
Output
Program
37
Output
Program
mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()
x = np.linspace(0,
mean) y =
np.linspace(0, std)
# Generate combination of
grids X, Y = np.meshgrid(x, y)
Z = func(X, Y)
38
Output
39
d. Histograms
Program
df.hist(figsize=(12,12),layout=(5,3))
Output
40
Output
e. Three dimensional
plotting Program
fig = plt.figure()
ax = plt.axes(projection = '3d')
x = df['number_emergency']
y=
df['number_inpatient'] y
z = df['number_outpatient']
z = pd.Series(x, name= '')
ax.plot3D(x, y, z, 'green')
ax.set_title('3D line plot diabetes dataset')
plt.show()
41
Output
RESULT:
Thus apply various plotting functions on UCI data set using Python Programming
42
Exp. No. 7.
Step 4: Display the Base map using built in method like basemap along with latitude
and longitude parameters
Step 5: Display the Coastal lines meters and Country boundaries using built in
methods
Step 6: Fill the Coastal lines meters and Country boundaries with suitable colours
Step 7: Create a global map with a Cylindrical Equidistant Projection, Orthographic
Projection, Robinson Projection
43
Output
Program
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)
44
Output
-
Create a global map with a Coastlines
Program
45
Program
plt.title("Coastlines", fontsize=20)
plt.show()
46
Create a global map with a Country boundaries
Program
x, y = m(-122.3, 47.6)
plt.show()
Output
m = Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
47
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
Output
48
Output
49
Output
50
Output
RESULT
Thus Geographic Data has been visualized using BaseMap module in Python
Programming successfully.
51