0% found this document useful (0 votes)

26 views53 pages

Fdsa Manual

The AD3411 Data Science and Analytics Lab Manual outlines the course structure for the AI & DS branch, detailing outcomes such as data handling with Python libraries like Numpy and Pandas, and performing various analytics techniques. It includes a list of experiments that cover topics like data exploration, inferential analytics, and model building. Additionally, the manual provides installation instructions and sample code for essential Python packages used in data science.

Uploaded by

lisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views53 pages

Fdsa Manual

Uploaded by

lisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Subject Code &Name: AD3411/ DATA SCIENCE AND ANALYTICS LABORATORY

Branch : AI &DS

Year/Semester : II/IV

Course Outcomes :

Write python programs to handle data using Numpy and Pandas

2 Perform descriptive analytics

3 Perform data exploration using Matplotlib

4 Perform inferential data analytics

5 Build models of predictive analytics

List of Equipments
TOOLS:
Python, Numpy, Scipy,Matplotlib,Pandas, Statsmodels,Seaborn, Plotly,bokeh

Ponjesly College of Engineering Page 1

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Subject Code &Name: AD3411/ DATA SCIENCE AND ANALYTICS LABORATORY

Branch : AI &DS
Year/Semester : II/IV

LIST OF EXPERIMENTS

S.No Description of Experiment

Working with Pandas data frames

2 Basic plots using Matplotlib

3 Frequency distributions, Averages, Variability

Normal curves, Correlation and scatter plots
4
Regression
5
Z-test
6

7 T-test

8 ANOVA

9 Building and validating linear models

10 Building and validating logistic models

Ponjesly College of Engineering Page 2

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

INDEX

Sl.
No. Date Program Name Mark Signature

Ponjesly College of Engineering Page 3

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF

Ex.No.1 NUMPY, SCIPY, JUPYTER, STATSMODELS AND
PANDAS PACKAGES

1a. Aim:

To download, install and explore the features of NumPy package.

Problem Description
Python is an open-source object-oriented language. It has many features of which
one is the wide range of external packages. There are a lot of packages for installation
and use for expanding functionalities. These packages are a repository of functions
in python script. NumPy is one such package to ease array computations. To install
all these python packages we use the pip- package installer. Pip is automatically
installed along with Python. We can then use pip in the command line to install
packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python
programming language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning
tools for working with arrays.
Prerequisites

•
Access to a terminal window/command line
• A user account with sudo privileges
• Python installed on your system

Downloading and installing Numpy:

Python NumPy is a general-purpose array processing package that

provides tools for handling n-dimensional arrays. It provides various
computing tools such as comprehensive mathematical functions, linear
Ponjesly College of Engineering Page 4
AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

algebra routines.Use the below command to install NumPy:

pip install numpy

output:

Sample python program using numpy:

import numpy as np
# Creating array object arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

Ponjesly College of Engineering Page 5

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Ponjesly College of Engineering Page 6

AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1b. Aim: To download, install and explore the features of Jupiter package.

Data Science:
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject
matter expertise to uncover actionable insights hidden in an organization’s data.

Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain live code, equations, visualizations, and narrative text.
Uses include data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one
of them. Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing
the Jupyter Notebook itself.

Procedure:

PIP is a package management system used to install and manage software

packages/libraries written in Python. These files are stored in a large “on-line
repository” termed as Python Package Index (PyPI). pip uses PyPI as the default
source for packages and their dependencies.

Installing Jupyter Notebook using pip:

To install Jupyter using pip, we need to first check if pip is

updated in our system. Use the following command to update pip:

python -m pip install --upgrade pip

After updating the pip version, follow the instructions provided below to install Jupyter:
Page 7
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Command to install Jupyter:

python -m pip install jupyter

Finished Installation:
Use the following command to launch Jupyter using command-line:

jupyter notebook

Page 8
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Launching Jupyter Notebook

Click New and select python 3(ipykernal) and type the following program. Click
run to execute the program.
Running the Python
program: Python code:
Program to find the area of a triangle
# Python Program to find the area of
triangle a = 5
b=6
c=7
# calculate the semi-perimeter s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Page 9
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output:

Page
10
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1 C Aim:

To download, install and explore the features of Scipy package.

Problem Description

Scipy is a python library that is useful in solving many mathematical equations

and algorithms. It is designed on the top of Numpy library that gives more extension
of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial
equations, LU Decomposition, etc. Using its high-level functions will significantly
reduce the complexity of the code and helps in better analyzing the data.

Downloading and Installing Scipy:

pip use the below command to install Scipy package on Windows:

pip install scipy

output

Sample python code using Scipy:

Type the program in Jupyter notebook

from scipy import special a = special.exp10(3)

print(a)
b = special.exp2(3)
print(b)
c = special.sindg(90)
print(c)

Page 11
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

d = special.cosdg(45)

print(d)

Output:

Page 12
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1D). Aim :
To download, install and explore the features of Panda packages.

Problem Description

Pandas is one of the most popular open-source

frameworks available for Python. It is among the fastest and most easy-to-use
libraries for data analysis and manipulation. Pandas dataframes are some of
the most useful data structures available in any library. It has uses in every
data-intensive field, including but not limited to scientific computing, data
science, and machine learning.

The library does not come included with a regular install of Python. To use
it, you must install the Pandas framework separately.

Installing Pandas on Windows

There are two ways of installing Pandas on Windows.

Method #1: Installing with pip

It is a package installation manager that makes installing Python libraries
and frameworks straightforward.

As long as you have a newer version of Python installed (> Python 3.4),
pip will be installed on your computer along with Python by default.

However, if you’re using an older version of Python, you will need to

install pip on your computer before installing Pandas.

Step #1: Launch Command Prompt

Press the Windows key on your keyboard or click on the Start button to open the
start menu. Type cmd, and the Command Prompt app should appear as a listing
in the start menu.

Step #2: Enter the Required Command

After you launch the command prompt, the next step in the process is to type in
the required command to initialize pip installation.
Page 13
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Enter the command

pip install pandas

on the terminal. This should launch the pip installer. The required
files will be downloaded, and Pandas will be ready to run on your
computer.

Panda package is successfully installed.

Sample program

// to be typed in Jupyter notebook

import pandas as pd
data = pd.DataFrame({"x1":["y", "x", "y", "x",
"x", "y"], "x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e",
"f"], "x5":range(30, 24, -
1)})
print(data)
Page 14
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)

Page 15
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1E). Aim:

To download, install and explore the features of Statsmodels package.

Problem Description:

Statsmodels is a popular library in Python that enables us to

estimate and analyze various statistical models. It is built on numeric and scientific
libraries like NumPy and SciPy.

Some of the essential features of this package are-

1. It includes various models of linear regression like ordinary least squares,

generalized least squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.

3. It also has some datasets for examples and testing.

4. Models based on survival analysis are also available.

5. All the statistical tests that we can imagine for data on a large scale are present.

Installing Statsmodels

Check the version of python installed in the PC.

Using Command Prompt

Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click
on it to open the command prompt.

Also, you can directly click on its icon if it is pinned on the taskbar.

1. Once the 'Command Prompt' screen is visible on your screen.

2. Type python -version and click on 'Enter'.

3. The version installed in your system would be displayed in the next line.

Page 16
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 17
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Installation of statsmodels

Now for installing statsmodels in our system, Open the Command

Prompt, type the following command and click on 'Enter'.

pip install statsmodels

Output

It's time to look have a program in which we will import statsmodels-

Here, we will perform OLS(Ordinary Least Squares) regression, in this technique

we will try to minimize the net sum of squares of difference between the calculated
value and observed value.

Page 18
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Program

import statsmodels.api as sm
import pandas
from patsy import dmatrices

df = sm.datasets.get_rdataset("Guerry", "HistData").data
vars = ['Department', 'Lottery', 'Literacy', 'Wealth', 'Region']
df = df[vars]
df[-5:]

OUTPUT

Result:

Thus the various package is downloaded, installed and the features are explored.

Page 19
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.2 WORKING WITH NUMPY ARRAYS

Aim :
Write a python program to show the woking of NumPy Arrays
in Python. 2a) Use Numpy array to demonstrate basic array
characteristics
b) Create Numpy array using list and tuple

c) Apply basic operations (+,_,*./) and find the transpose of the matrix

d) Perform sorting operation with Numpy arrays

Problem Description
Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional array.
• It is a table of elements (usually numbers), all of the same type, indexed
by a tuple of positive integers.
• In NumPy dimensions are called axes. The number of axes is rank.
• NumPy’s array class is called ndarray. It is also known by the alias array.

Example 1:
Write a python program to demonstrate the basic NumPy array
characteristics

import numpy as np

# Creating array object

arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
Page 20
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

# Printing type of elements in array

print("Array stores elements of type: ", arr.dtype)

Output :
Array is of type: <class 'numpy.ndarray'>

No. of dimensions:
2 Shape of array:
(2, 3) Size of array:
6
Array stores elements of type: int64

Example 2: Basic operations

Arithmetic oprations on NumPy

Array Program:
import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])
print("Addition")
print(array1 + array2)
print("-" * 20)
print("Subtraction")
print(array1 - array2)
print("-" * 20)
print("Multiplication")
print(array1 * array2)
print("-" * 20)
print("Division")
print(array2 / array1)
print("-" * 40)
print(array1 ** array2)
print("-" * 40)

Page 21
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

a = np.array([1, 2, 5, 3])
print ("Adding 1 to every element:", a+1)
print ("Subtracting 3 from each element:", a-3)
print ("Multiplying each element by 10:", a*10)
print ("Squaring each element:", a**2)
a *= 2
print ("Doubled each element of original array:", a)
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)

Output
Addition
[[ 8 10 12]
[14 16 18]]
Subtraction
[[-6 -6 -6]
[-6 -6 -6]]
Multiplication
[[ 7 16 27]
[40 55 72]]
Division
[[7. 4. 3. ]
[2.5 2.2 2. ]]
[[ 1 256 19683]
[ 1048576 48828125 -2118184960]]

Adding 1 to every element: [2 3 6 4]

Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]

Page 22
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Original array:

[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]

Example 3: Sorting array: There is a simple np.sort method for sorting NumPy
arrays. Let’s explore it a bit.
Program:
import numpy as np
a = np.array([[1, 4, 2],[3, 4, 6],[0, -1, 5]])

# sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))

# sort array row-wise

print ("Row-wise sorted array:\n", np.sort(a, axis = 1))

# specify sort algorithm

print ("Column wise sort by applying merge-sort:\n", np.sort(a, axis = 0, kind = 'mergesort'))

# Example to show sorting of structured

array # set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7),
('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]

# Creating array
arr = np.array(values, dtype = dtypes)
Page 23
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

print ("\nArray sorted by names:\n",

Page 24
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

np.sort(arr, order = 'name'))

print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order = ['grad_year',
'cgpa']))

OUTPUT
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:

[[ 2 4]
1
[ 3 4 6]
[- 0 5]]
1
Column wise sort by applying
merge-sort: [[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:

[('Aakash', 2009, 9.0) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Pankaj', 2008, 7.9)]
Array sorted by graduation year and then cgpa:
[('Pankaj', 2008, 7.9) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Aakash', 2009, 9.0)]

Result:

Thus the python programs are written and executed to explain the features of NumPy
array.

Page 25
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.3 WORKING WITH PANDAS DATA FRAMES

Aim:
Write a python program to work with Panda data frames.
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a
Python
package that offers various data structures and operations for manipulating
numerical data and time series. It is mainly popular for importing and analyzing
data much easier. Pandas is fast and it has high-performance & productivity for
users.
Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file, and Excel
file. Pandas DataFrame can be created from the lists, dictionary, and from a list
of dictionary etc.
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. Pandas DataFrame consists of three principal components, the data,
rows, and columns.

Page 26
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Creating a Panda Data Frames

A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

Creating an empty dataframe :

A basic DataFrame, which can be created is an Empty Dataframe. An Empty
Dataframe is created just by calling a dataframe constructor.

Creating a dataframe using List:

DataFrame can be created using a single list or a list of lists.

Creating dataframe from dict of ndarray/lists:

To create dataframe from dict of narray/list, all the narray must be of same length.
If index is passed then the length index should be equal to the length of arrays. If
no index is passed, then by default, index will be range(n) where n is the array
length.
Iterating over rows :

In order to iterate over rows, we can use three function iteritems(), iterrows(),
itertuples() . These three function will help in iteration over rows.

Program

import pandas as pd

# Calling DataFrame constructor

print("Empty dataframe")
df = pd.DataFrame()
print(df)
print("Dataframe creation using List")
# list of strings

Page 27
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list

df = pd.DataFrame(lst)

print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

# Create dataframe

df = pd.DataFrame(Data)

# Print the output.

print(df)
print("Create dataframe from dictionoary of lists")
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech",
"MBA"], 'Score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary

df = pd.DataFrame(dict)
print(df)
# iterating over rows using iterrows()
function for i, j in df.iterrows():
print(i, j)
print()

Page 28
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT
Empty
dataframe
Empty
DataFrame
Columns: []
Index: []

Dataframe creation using

List 0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
6 Geeks
Name
Age
0 Tom 20
1 nick
21 2 krish
19
3 jack 18

Create dataframe from dictionoary

of lists name Degree Score
0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku MBA 98

0 name aparna
Degree MBA
Score 90
Name: 0, dtype: object
Page 29
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1 name pankaj
Degree BCA
Score 40
Name: 1, dtype: object

2 name sudhir
Degree
M.Tech Score
80
Name: 2, dtype: object

3 name Geeku
Degree MBA
Score 98
Name: 3, dtype: object

Pandas Dataframe
visualization Retrieving
data from the web

In[1]:

import pandas as
pd url =

'https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total_v2.xls
x?raw=True' df = pd.read_excel(url)
df

Page 30
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT:

Pandas for retrieving data from the csv file

In[2]:
import pandas as pd

data = pd.read_csv(r'C:\Users\HI\Downloads\PythonDataScienceHandbook-
master\notebooks\data\iris.csv')
df =

pd.DataFrame(data)

print (df)

Page 31
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT:

Result:
Thus the python program is written to show the working of pandas dataframes

Page 32
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 33
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.4 BASIC PLOTS USING MATPLOTLIB

Aim: Write a python program for implementing basic plots using Matplotlib.
Matplotlib
It is a data visualization library in Python.
The pyplot, a sublibrary of matplotlib, is a collection of functions that helps
in creating a variety of charts.
Line charts
These are used to represent the relation between two data X and Y on a
different axis.
First import Matplotlib.pyplot library for plotting functions. Also, import the
Numpy library as per requirement. Then define data values x and y.
Exapmle 4a) : Python program for Simple line plot with labels and title

Example 4b : Program to plot two lines on the same graph

Page 34
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 35
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Example 4c : Python Program for bar chart

Page 36
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Page 37
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Example 4d : Python Program for bar chart

OUTPUT

Result
Thus the python program for implemention of basic plots using
Matplotlib was done and executed successfully

Page 38
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.5 NORMAL CURVES AND CORRELATION AND SCATTER

PLOTS

Aim : To implement python programs for Normal curves and correction and
scatter plots using different python libraries.

Prerequisites:
• Matplotlib
• Numpy
• Scipy
• Statistics

5a) Normal Curves

Algorithm
• Step1: Import the needed module
• Step2: Create data points
• Step3: Calculate the mean and standard deviation
• Step3: Calculate normal probability density
• Step4: Plot using above calculated values
• Step5: Display plot

Program

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics

# Plot between -10 and 10 with .001 steps.

x_axis = np.arange(-20, 20, 0.01)

# Calculating mean and standard deviation

mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)

Page 39
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()
Output

5b) Correlation and scatter plots

Algorithms
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph using scatter plot.
Step 4: Add the title and labelling

Program
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
# plotting the data
plt.scatter(x, y)

# This will fit the best line into the graph

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
# adds the title
plt.title('Correlation')

# plot the data

Page 40
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

plt.scatter(x, y)

# fits the best fitting line to the data

plt.plot(np.unique(x),
np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')

# Labelling axes
plt.xlabel('x axis')
plt.ylabel('y axix')
Output

Result:
Page 41
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python programs for Normal curves , correction and scatter plots
using different python libraries was implemented and verified successfully.

Ex.No.6 REGRESSION

Aim : To implement python programs for Regression using python libraries.

Linear Regression: - Simple linear regression is an approach for predicting

a response using
a single feature.
- It is assumed that the two variables are linearly related.
- Find a linear function that predicts the response value(y) as
accurately as
- possible as a function of the feature or independent variable(x).
Algorithm
Step 1: Importing the libraries.
Step 2: Assign the mean for x and y.
Step 3: Calculating cross-deviation and deviation about x
Step 4: Calculating regression coefficients
Step 5: Plot the actual points as scatter plot
Step 6: Predict response vector
Step 7: plot the regression line
Step 8: Assign the labels for x and y
Step 9: Assign the values for observations / data
Step 10: Estimating the coefficients
Step 11: Plot the Regression line
Program
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

Page 42
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
Page 43
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output

Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

Result:
Page 44
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python code for Regression using various python libraries was i
mplemented and verified successfully.

Ex.No.7 Z-TEST

Aim : To implement python programs for Z-test.

Algorithm
Step 1: Import the libraries
Step 2: Generate a random array of 50 numbers having mean 110 and sd 15
Step 3: Print mean and sd
Step 4: Perform the test.
Step 5: Pass the mean value in the null hypothesis, in alternative hypothesis we
check whether the mean is larger

Step 6: Assign the function outputs a p_value and z-score corresponding to that
value, then

compare the p-value with alpha, if it is greater than alpha then we do not
null hypothesis

Step 7: Else we reject it.

Program
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15

# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
Page 45
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check
whether the
# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

# the function outputs a p_value and z-score corresponding to that value, we compare
the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.

if(p_value < alpha):

print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")

Output
mean=110.22 stdv=2.37
Reject Null Hypothesis

Result:

Page 46
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python code for Z- test was implemented and verified successfully.

Ex.No.8 T-TEST

Aim : To implement python programs for T test.

Algorithm

Step 1: Import the library

Step 2: Create the data groups
Step 3: Perform the two sample t-test with equal variances

Program
# Python program to demonstrate how to
# perform two sample T-test
# Import the library
import scipy.stats as stats
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)

Page 47
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output

Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)

Result:

Thus the python code for T- test was implemented and verified successfully.

Ex.No.9 ANOVA

Aim: To implement a python code for ANOVA

Algorithm

Steps 1: Import python library packages

Step 2: Create the data groups

Step 3: Conduct the one-way ANOVA

Program

# Importing library
from scipy.stats import f_oneway

# Performance when each of the engine

# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]

# Conduct the one-way ANOVA

f_oneway(performance1, performance2, performance3, performance4)

Page 48
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output

F_onewayResult(statistic=4.625000000000002, pvalue=0.01633645983978022)

Result:

Thus the python code for ANOVA was implemented and verified successfully.

Ex.No.10 BUILDING AND VALIDATING LINEAR MODEL

Aim: To implement a linear regression model using python libraries .

Algorithm:

Step1: Import the packages and classes that you need.

Step2: Provide data to work with, and eventually do appropriate transformations.

Step3: Create a regression model and fit it with existing data.

Step4: Check the results of model fitting to know whether the model is
satisfactory.

Step5: Apply the model for predictions.

Page 49
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Program

Page 50
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Result: Thus the python program for linear regression model using python libraries
was implemented successfully.

Page 51
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.11 BUILDING AND VALIDATING LOGISTIC MODEL

Aim: To implement a logistic regression model using python libraries .

Algorithm

Step 1: Import the required modules

Step 2: Generate the dataset

Step 3: Visualize the Data

Step 4: Split the Dataset

Step 5: Perform Logistic Regression

Page 52
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Result: Thus the python program for logistic regression model using python
libraries was implemented successfully.

Page 53

OCS353 - Data Science Manual-FULL
No ratings yet
OCS353 - Data Science Manual-FULL
64 pages
Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
Lesson Plan - Metal Work
50% (2)
Lesson Plan - Metal Work
6 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
Cse-Fds Lab Manual
No ratings yet
Cse-Fds Lab Manual
74 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
Data Science Lab
No ratings yet
Data Science Lab
61 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Fds PDF
No ratings yet
Fds PDF
4 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
65 pages
Ex - No-1 Installation and Exploration
No ratings yet
Ex - No-1 Installation and Exploration
3 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
DS409 DataScience LabManual Jan2021
No ratings yet
DS409 DataScience LabManual Jan2021
41 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
ML Lab (2MCA)
No ratings yet
ML Lab (2MCA)
52 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Fods Final Done
No ratings yet
Fods Final Done
67 pages
FODS Record
No ratings yet
FODS Record
66 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
ML Exp 1
No ratings yet
ML Exp 1
6 pages
Dev New
No ratings yet
Dev New
44 pages
ML Lab 1
No ratings yet
ML Lab 1
24 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Machine Learning With Python-Python EcoSystem
No ratings yet
Machine Learning With Python-Python EcoSystem
19 pages
Centurion University of Technology and Managament: Topic: Numpy, Pandas, Matplotlib
No ratings yet
Centurion University of Technology and Managament: Topic: Numpy, Pandas, Matplotlib
12 pages
UNIT-6 (Data Analytics and Visualization With Python)
No ratings yet
UNIT-6 (Data Analytics and Visualization With Python)
41 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
ML Lab Manual VI Sem
No ratings yet
ML Lab Manual VI Sem
35 pages
Data Ty
No ratings yet
Data Ty
59 pages
Machine Learning Lab Set1
No ratings yet
Machine Learning Lab Set1
5 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Python Libraries For Data Science 1679435534
No ratings yet
Python Libraries For Data Science 1679435534
64 pages
Fds Lab Final 2nd Year
No ratings yet
Fds Lab Final 2nd Year
75 pages
C Programming Lab Notes
No ratings yet
C Programming Lab Notes
30 pages
PYTHON FOR BEGINNERS: A Comprehensive Guide to Learning Python Programming from Scratch (2023)
From Everand
PYTHON FOR BEGINNERS: A Comprehensive Guide to Learning Python Programming from Scratch (2023)
Denton Freeman
No ratings yet
ML (Lab Programs)
No ratings yet
ML (Lab Programs)
28 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
53 pages
Python Programming: Learn, Code, Create
From Everand
Python Programming: Learn, Code, Create
Sachin Naha
No ratings yet
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Unit 5
No ratings yet
Unit 5
8 pages
Python Week+1 New
No ratings yet
Python Week+1 New
44 pages
CS3361-Data Science Laboratory Manual
No ratings yet
CS3361-Data Science Laboratory Manual
58 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Lab Manual ET Lab III
No ratings yet
Lab Manual ET Lab III
38 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
BDA Unit 4
No ratings yet
BDA Unit 4
41 pages
BDA Unit 3
No ratings yet
BDA Unit 3
37 pages
Lab Manual - CCS366
No ratings yet
Lab Manual - CCS366
36 pages
ML Manual
No ratings yet
ML Manual
34 pages
SonarQube Users (Archive) - Java - lang.OutOfMemoryError - Java Heap Space PDF
No ratings yet
SonarQube Users (Archive) - Java - lang.OutOfMemoryError - Java Heap Space PDF
9 pages
Worksheet 3 LS6 - MIANO, REYMARK
No ratings yet
Worksheet 3 LS6 - MIANO, REYMARK
1 page
Features
No ratings yet
Features
7 pages
Sony Ericsson Product
No ratings yet
Sony Ericsson Product
34 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
Regent College London New
No ratings yet
Regent College London New
2 pages
Defects
No ratings yet
Defects
51 pages
Seafarer Medical Certificate
No ratings yet
Seafarer Medical Certificate
2 pages
Porsche Case Study
No ratings yet
Porsche Case Study
4 pages
My MVP in Volleyball: Individual Awards: Collegiate Awards
No ratings yet
My MVP in Volleyball: Individual Awards: Collegiate Awards
1 page
K00200 - 20211027174133 - Rubrics Individual Assignment Paf3113 Sem A202
No ratings yet
K00200 - 20211027174133 - Rubrics Individual Assignment Paf3113 Sem A202
7 pages
Carolina Reaper
No ratings yet
Carolina Reaper
19 pages
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
No ratings yet
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
15 pages
CS Executive Sbec MCQ Questions With Answers
No ratings yet
CS Executive Sbec MCQ Questions With Answers
20 pages
Sony MDS-JE 520 User Manual
No ratings yet
Sony MDS-JE 520 User Manual
136 pages
TOCFL 基礎級 A2
No ratings yet
TOCFL 基礎級 A2
11 pages
FICM Unit 3
No ratings yet
FICM Unit 3
6 pages
The Act
No ratings yet
The Act
2 pages
Term 2 Year 1 Hass
No ratings yet
Term 2 Year 1 Hass
13 pages
People v. Pagal
No ratings yet
People v. Pagal
3 pages
Complete Guide To Service Learning 2
No ratings yet
Complete Guide To Service Learning 2
110 pages
Saqs Methods Cog T and D
No ratings yet
Saqs Methods Cog T and D
2 pages
Fa22 Rba 003
No ratings yet
Fa22 Rba 003
7 pages
3.1 Tuple Relational Calculus
No ratings yet
3.1 Tuple Relational Calculus
11 pages
Notes Summer 2024 - Finance and Economics Summary
No ratings yet
Notes Summer 2024 - Finance and Economics Summary
3 pages
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
100% (3)
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
84 pages
Lesson 3 Classification of Drugs 9learners
No ratings yet
Lesson 3 Classification of Drugs 9learners
50 pages
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
No ratings yet
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
52 pages
Recovery Is Everywhere Handout
No ratings yet
Recovery Is Everywhere Handout
3 pages