Fdsa Manual
Fdsa Manual
Branch : AI &DS
Year/Semester : II/IV
Course Outcomes :
List of Equipments
TOOLS:
Python, Numpy, Scipy,Matplotlib,Pandas, Statsmodels,Seaborn, Plotly,bokeh
Branch : AI &DS
Year/Semester : II/IV
LIST OF EXPERIMENTS
7 T-test
8 ANOVA
INDEX
Sl.
No. Date Program Name Mark Signature
1a. Aim:
Problem Description
Python is an open-source object-oriented language. It has many features of which
one is the wide range of external packages. There are a lot of packages for installation
and use for expanding functionalities. These packages are a repository of functions
in python script. NumPy is one such package to ease array computations. To install
all these python packages we use the pip- package installer. Pip is automatically
installed along with Python. We can then use pip in the command line to install
packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python
programming language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning
tools for working with arrays.
Prerequisites
•
Access to a terminal window/command line
• A user account with sudo privileges
• Python installed on your system
output:
OUTPUT
1b. Aim: To download, install and explore the features of Jupiter package.
Data Science:
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject
matter expertise to uncover actionable insights hidden in an organization’s data.
Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain live code, equations, visualizations, and narrative text.
Uses include data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one
of them. Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing
the Jupyter Notebook itself.
Procedure:
After updating the pip version, follow the instructions provided below to install Jupyter:
Page 7
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Finished Installation:
Use the following command to launch Jupyter using command-line:
jupyter notebook
Page 8
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Click New and select python 3(ipykernal) and type the following program. Click
run to execute the program.
Running the Python
program: Python code:
Program to find the area of a triangle
# Python Program to find the area of
triangle a = 5
b=6
c=7
# calculate the semi-perimeter s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Page 9
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Output:
Page
10
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
1 C Aim:
Problem Description
output
Page 11
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
d = special.cosdg(45)
print(d)
Output:
Page 12
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
1D). Aim :
To download, install and explore the features of Panda packages.
Problem Description
The library does not come included with a regular install of Python. To use
it, you must install the Pandas framework separately.
As long as you have a newer version of Python installed (> Python 3.4),
pip will be installed on your computer along with Python by default.
on the terminal. This should launch the pip installer. The required
files will be downloaded, and Pandas will be ready to run on your
computer.
Sample program
import pandas as pd
data = pd.DataFrame({"x1":["y", "x", "y", "x",
"x", "y"], "x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e",
"f"], "x5":range(30, 24, -
1)})
print(data)
Page 14
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)
Page 15
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
1E). Aim:
Problem Description:
5. All the statistical tests that we can imagine for data on a large scale are present.
Installing Statsmodels
Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click
on it to open the command prompt.
Also, you can directly click on its icon if it is pinned on the taskbar.
3. The version installed in your system would be displayed in the next line.
Page 16
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Page 17
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Installation of statsmodels
Page 18
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Program
import statsmodels.api as sm
import pandas
from patsy import dmatrices
df = sm.datasets.get_rdataset("Guerry", "HistData").data
vars = ['Department', 'Lottery', 'Literacy', 'Wealth', 'Region']
df = df[vars]
df[-5:]
OUTPUT
Result:
Thus the various package is downloaded, installed and the features are explored.
Page 19
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Aim :
Write a python program to show the woking of NumPy Arrays
in Python. 2a) Use Numpy array to demonstrate basic array
characteristics
b) Create Numpy array using list and tuple
c) Apply basic operations (+,_,*./) and find the transpose of the matrix
Problem Description
Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional array.
• It is a table of elements (usually numbers), all of the same type, indexed
by a tuple of positive integers.
• In NumPy dimensions are called axes. The number of axes is rank.
• NumPy’s array class is called ndarray. It is also known by the alias array.
Example 1:
Write a python program to demonstrate the basic NumPy array
characteristics
import numpy as np
Output :
Array is of type: <class 'numpy.ndarray'>
No. of dimensions:
2 Shape of array:
(2, 3) Size of array:
6
Array stores elements of type: int64
Array Program:
import numpy as np
Page 21
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
a = np.array([1, 2, 5, 3])
print ("Adding 1 to every element:", a+1)
print ("Subtracting 3 from each element:", a-3)
print ("Multiplying each element by 10:", a*10)
print ("Squaring each element:", a**2)
a *= 2
print ("Doubled each element of original array:", a)
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)
Output
Addition
[[ 8 10 12]
[14 16 18]]
Subtraction
[[-6 -6 -6]
[-6 -6 -6]]
Multiplication
[[ 7 16 27]
[40 55 72]]
Division
[[7. 4. 3. ]
[2.5 2.2 2. ]]
[[ 1 256 19683]
[ 1048576 48828125 -2118184960]]
Page 22
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
Example 3: Sorting array: There is a simple np.sort method for sorting NumPy
arrays. Let’s explore it a bit.
Program:
import numpy as np
a = np.array([[1, 4, 2],[3, 4, 6],[0, -1, 5]])
# sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))
# Creating array
arr = np.array(values, dtype = dtypes)
Page 23
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Page 24
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 2 4]
1
[ 3 4 6]
[- 0 5]]
1
Column wise sort by applying
merge-sort: [[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]
Result:
Thus the python programs are written and executed to explain the features of NumPy
array.
Page 25
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Aim:
Write a python program to work with Panda data frames.
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a
Python
package that offers various data structures and operations for manipulating
numerical data and time series. It is mainly popular for importing and analyzing
data much easier. Pandas is fast and it has high-performance & productivity for
users.
Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file, and Excel
file. Pandas DataFrame can be created from the lists, dictionary, and from a list
of dictionary etc.
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. Pandas DataFrame consists of three principal components, the data,
rows, and columns.
Page 26
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
To create dataframe from dict of narray/list, all the narray must be of same length.
If index is passed then the length index should be equal to the length of arrays. If
no index is passed, then by default, index will be range(n) where n is the array
length.
Iterating over rows :
In order to iterate over rows, we can use three function iteritems(), iterrows(),
itertuples() . These three function will help in iteration over rows.
Program
import pandas as pd
Page 27
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create dataframe
df = pd.DataFrame(Data)
print(df)
print("Create dataframe from dictionoary of lists")
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech",
"MBA"], 'Score':[90, 40, 80, 98]}
Page 28
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT
Empty
dataframe
Empty
DataFrame
Columns: []
Index: []
0 name aparna
Degree MBA
Score 90
Name: 0, dtype: object
Page 29
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
1 name pankaj
Degree BCA
Score 40
Name: 1, dtype: object
2 name sudhir
Degree
M.Tech Score
80
Name: 2, dtype: object
3 name Geeku
Degree MBA
Score 98
Name: 3, dtype: object
Pandas Dataframe
visualization Retrieving
data from the web
In[1]:
import pandas as
pd url =
'https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total_v2.xls
x?raw=True' df = pd.read_excel(url)
df
Page 30
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT:
data = pd.read_csv(r'C:\Users\HI\Downloads\PythonDataScienceHandbook-
master\notebooks\data\iris.csv')
df =
pd.DataFrame(data)
print (df)
Page 31
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT:
Result:
Thus the python program is written to show the working of pandas dataframes
Page 32
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Page 33
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Aim: Write a python program for implementing basic plots using Matplotlib.
Matplotlib
It is a data visualization library in Python.
The pyplot, a sublibrary of matplotlib, is a collection of functions that helps
in creating a variety of charts.
Line charts
These are used to represent the relation between two data X and Y on a
different axis.
First import Matplotlib.pyplot library for plotting functions. Also, import the
Numpy library as per requirement. Then define data values x and y.
Exapmle 4a) : Python program for Simple line plot with labels and title
Page 35
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Page 36
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT
Page 37
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT
Result
Thus the python program for implemention of basic plots using
Matplotlib was done and executed successfully
Page 38
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Aim : To implement python programs for Normal curves and correction and
scatter plots using different python libraries.
Prerequisites:
• Matplotlib
• Numpy
• Scipy
• Statistics
Program
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
Page 39
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Algorithms
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph using scatter plot.
Step 4: Add the title and labelling
Program
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
# plotting the data
plt.scatter(x, y)
plt.scatter(x, y)
plt.plot(np.unique(x),
np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
# Labelling axes
plt.xlabel('x axis')
plt.ylabel('y axix')
Output
Result:
Page 41
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Thus the python programs for Normal curves , correction and scatter plots
using different python libraries was implemented and verified successfully.
Ex.No.6 REGRESSION
Page 42
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
# putting labels
plt.xlabel('x')
plt.ylabel('y')
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
Page 43
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Output
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
Result:
Page 44
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Thus the python code for Regression using various python libraries was i
mplemented and verified successfully.
Ex.No.7 Z-TEST
Algorithm
Step 1: Import the libraries
Step 2: Generate a random array of 50 numbers having mean 110 and sd 15
Step 3: Print mean and sd
Step 4: Perform the test.
Step 5: Pass the mean value in the null hypothesis, in alternative hypothesis we
check whether the mean is larger
Step 6: Assign the function outputs a p_value and z-score corresponding to that
value, then
compare the p-value with alpha, if it is greater than alpha then we do not
null hypothesis
Program
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check
whether the
# mean is larger
Output
mean=110.22 stdv=2.37
Reject Null Hypothesis
Result:
Page 46
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Thus the python code for Z- test was implemented and verified successfully.
Ex.No.8 T-TEST
Algorithm
Program
# Python program to demonstrate how to
# perform two sample T-test
# Import the library
import scipy.stats as stats
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)
Page 47
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Output
Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)
Result:
Thus the python code for T- test was implemented and verified successfully.
Ex.No.9 ANOVA
Algorithm
Program
# Importing library
from scipy.stats import f_oneway
Page 48
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Output
F_onewayResult(statistic=4.625000000000002, pvalue=0.01633645983978022)
Result:
Thus the python code for ANOVA was implemented and verified successfully.
Algorithm:
Step4: Check the results of model fitting to know whether the model is
satisfactory.
Page 49
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Program
Page 50
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Result: Thus the python program for linear regression model using python libraries
was implemented successfully.
Page 51
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
Algorithm
Page 52
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL
OUTPUT
Result: Thus the python program for logistic regression model using python
libraries was implemented successfully.
Page 53