0% found this document useful (0 votes)
26 views53 pages

Fdsa Manual

The AD3411 Data Science and Analytics Lab Manual outlines the course structure for the AI & DS branch, detailing outcomes such as data handling with Python libraries like Numpy and Pandas, and performing various analytics techniques. It includes a list of experiments that cover topics like data exploration, inferential analytics, and model building. Additionally, the manual provides installation instructions and sample code for essential Python packages used in data science.

Uploaded by

lisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views53 pages

Fdsa Manual

The AD3411 Data Science and Analytics Lab Manual outlines the course structure for the AI & DS branch, detailing outcomes such as data handling with Python libraries like Numpy and Pandas, and performing various analytics techniques. It includes a list of experiments that cover topics like data exploration, inferential analytics, and model building. Additionally, the manual provides installation instructions and sample code for essential Python packages used in data science.

Uploaded by

lisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Subject Code &Name: AD3411/ DATA SCIENCE AND ANALYTICS LABORATORY

Branch : AI &DS

Year/Semester : II/IV

Course Outcomes :

Write python programs to handle data using Numpy and Pandas


1

2 Perform descriptive analytics


3 Perform data exploration using Matplotlib

4 Perform inferential data analytics

5 Build models of predictive analytics

List of Equipments
TOOLS:
Python, Numpy, Scipy,Matplotlib,Pandas, Statsmodels,Seaborn, Plotly,bokeh

Ponjesly College of Engineering Page 1


AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Subject Code &Name: AD3411/ DATA SCIENCE AND ANALYTICS LABORATORY

Branch : AI &DS
Year/Semester : II/IV

LIST OF EXPERIMENTS

S.No Description of Experiment

Working with Pandas data frames


1

2 Basic plots using Matplotlib

3 Frequency distributions, Averages, Variability


Normal curves, Correlation and scatter plots
4
Regression
5
Z-test
6

7 T-test

8 ANOVA

9 Building and validating linear models

10 Building and validating logistic models

Ponjesly College of Engineering Page 2


AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

INDEX

Sl.
No. Date Program Name Mark Signature

Ponjesly College of Engineering Page 3


AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF


Ex.No.1 NUMPY, SCIPY, JUPYTER, STATSMODELS AND
PANDAS PACKAGES

1a. Aim:

To download, install and explore the features of NumPy package.

Problem Description
Python is an open-source object-oriented language. It has many features of which
one is the wide range of external packages. There are a lot of packages for installation
and use for expanding functionalities. These packages are a repository of functions
in python script. NumPy is one such package to ease array computations. To install
all these python packages we use the pip- package installer. Pip is automatically
installed along with Python. We can then use pip in the command line to install
packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python
programming language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning
tools for working with arrays.
Prerequisites


Access to a terminal window/command line
• A user account with sudo privileges
• Python installed on your system

Downloading and installing Numpy:

Python NumPy is a general-purpose array processing package that


provides tools for handling n-dimensional arrays. It provides various
computing tools such as comprehensive mathematical functions, linear
Ponjesly College of Engineering Page 4
AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

algebra routines.Use the below command to install NumPy:

pip install numpy

output:

Sample python program using numpy:


import numpy as np
# Creating array object arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

Ponjesly College of Engineering Page 5


AD3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Ponjesly College of Engineering Page 6


AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1b. Aim: To download, install and explore the features of Jupiter package.

Data Science:
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject
matter expertise to uncover actionable insights hidden in an organization’s data.

Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain live code, equations, visualizations, and narrative text.
Uses include data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one
of them. Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing
the Jupyter Notebook itself.

Procedure:

PIP is a package management system used to install and manage software


packages/libraries written in Python. These files are stored in a large “on-line
repository” termed as Python Package Index (PyPI). pip uses PyPI as the default
source for packages and their dependencies.

Installing Jupyter Notebook using pip:

To install Jupyter using pip, we need to first check if pip is


updated in our system. Use the following command to update pip:

python -m pip install --upgrade pip

After updating the pip version, follow the instructions provided below to install Jupyter:
Page 7
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Command to install Jupyter:

python -m pip install jupyter

Finished Installation:
Use the following command to launch Jupyter using command-line:

jupyter notebook

Page 8
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Launching Jupyter Notebook

Click New and select python 3(ipykernal) and type the following program. Click
run to execute the program.
Running the Python
program: Python code:
Program to find the area of a triangle
# Python Program to find the area of
triangle a = 5
b=6
c=7
# calculate the semi-perimeter s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Page 9
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output:

Page
10
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1 C Aim:

To download, install and explore the features of Scipy package.

Problem Description

Scipy is a python library that is useful in solving many mathematical equations


and algorithms. It is designed on the top of Numpy library that gives more extension
of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial
equations, LU Decomposition, etc. Using its high-level functions will significantly
reduce the complexity of the code and helps in better analyzing the data.

Downloading and Installing Scipy:

pip use the below command to install Scipy package on Windows:

pip install scipy

output

Sample python code using Scipy:

Type the program in Jupyter notebook

from scipy import special a = special.exp10(3)


print(a)
b = special.exp2(3)
print(b)
c = special.sindg(90)
print(c)

Page 11
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

d = special.cosdg(45)

print(d)

Output:

Page 12
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1D). Aim :
To download, install and explore the features of Panda packages.

Problem Description

Pandas is one of the most popular open-source


frameworks available for Python. It is among the fastest and most easy-to-use
libraries for data analysis and manipulation. Pandas dataframes are some of
the most useful data structures available in any library. It has uses in every
data-intensive field, including but not limited to scientific computing, data
science, and machine learning.

The library does not come included with a regular install of Python. To use
it, you must install the Pandas framework separately.

Installing Pandas on Windows


There are two ways of installing Pandas on Windows.

Method #1: Installing with pip


It is a package installation manager that makes installing Python libraries
and frameworks straightforward.

As long as you have a newer version of Python installed (> Python 3.4),
pip will be installed on your computer along with Python by default.

However, if you’re using an older version of Python, you will need to


install pip on your computer before installing Pandas.

Step #1: Launch Command Prompt


Press the Windows key on your keyboard or click on the Start button to open the
start menu. Type cmd, and the Command Prompt app should appear as a listing
in the start menu.

Step #2: Enter the Required Command


After you launch the command prompt, the next step in the process is to type in
the required command to initialize pip installation.
Page 13
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Enter the command

pip install pandas

on the terminal. This should launch the pip installer. The required
files will be downloaded, and Pandas will be ready to run on your
computer.

Panda package is successfully installed.

Sample program

// to be typed in Jupyter notebook

import pandas as pd
data = pd.DataFrame({"x1":["y", "x", "y", "x",
"x", "y"], "x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e",
"f"], "x5":range(30, 24, -
1)})
print(data)
Page 14
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)

Page 15
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1E). Aim:

To download, install and explore the features of Statsmodels package.

Problem Description:

Statsmodels is a popular library in Python that enables us to


estimate and analyze various statistical models. It is built on numeric and scientific
libraries like NumPy and SciPy.

Some of the essential features of this package are-

1. It includes various models of linear regression like ordinary least squares,


generalized least squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.

3. It also has some datasets for examples and testing.

4. Models based on survival analysis are also available.

5. All the statistical tests that we can imagine for data on a large scale are present.

Installing Statsmodels

Check the version of python installed in the PC.

Using Command Prompt

Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click
on it to open the command prompt.

Also, you can directly click on its icon if it is pinned on the taskbar.

1. Once the 'Command Prompt' screen is visible on your screen.

2. Type python -version and click on 'Enter'.

3. The version installed in your system would be displayed in the next line.

Page 16
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 17
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Installation of statsmodels

Now for installing statsmodels in our system, Open the Command


Prompt, type the following command and click on 'Enter'.

pip install statsmodels


Output

It's time to look have a program in which we will import statsmodels-

Here, we will perform OLS(Ordinary Least Squares) regression, in this technique


we will try to minimize the net sum of squares of difference between the calculated
value and observed value.

Page 18
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Program

import statsmodels.api as sm
import pandas
from patsy import dmatrices

df = sm.datasets.get_rdataset("Guerry", "HistData").data
vars = ['Department', 'Lottery', 'Literacy', 'Wealth', 'Region']
df = df[vars]
df[-5:]

OUTPUT

Result:

Thus the various package is downloaded, installed and the features are explored.

Page 19
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.2 WORKING WITH NUMPY ARRAYS

Aim :
Write a python program to show the woking of NumPy Arrays
in Python. 2a) Use Numpy array to demonstrate basic array
characteristics
b) Create Numpy array using list and tuple

c) Apply basic operations (+,_,*./) and find the transpose of the matrix

d) Perform sorting operation with Numpy arrays

Problem Description
Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional array.
• It is a table of elements (usually numbers), all of the same type, indexed
by a tuple of positive integers.
• In NumPy dimensions are called axes. The number of axes is rank.
• NumPy’s array class is called ndarray. It is also known by the alias array.

Example 1:
Write a python program to demonstrate the basic NumPy array
characteristics

import numpy as np

# Creating array object


arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
Page 20
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

# Printing type of elements in array


print("Array stores elements of type: ", arr.dtype)

Output :
Array is of type: <class 'numpy.ndarray'>

No. of dimensions:
2 Shape of array:
(2, 3) Size of array:
6
Array stores elements of type: int64

Example 2: Basic operations

Arithmetic oprations on NumPy

Array Program:
import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])


array2 = np.array([[7, 8, 9], [10, 11, 12]])
print("Addition")
print(array1 + array2)
print("-" * 20)
print("Subtraction")
print(array1 - array2)
print("-" * 20)
print("Multiplication")
print(array1 * array2)
print("-" * 20)
print("Division")
print(array2 / array1)
print("-" * 40)
print(array1 ** array2)
print("-" * 40)

Page 21
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

a = np.array([1, 2, 5, 3])
print ("Adding 1 to every element:", a+1)
print ("Subtracting 3 from each element:", a-3)
print ("Multiplying each element by 10:", a*10)
print ("Squaring each element:", a**2)
a *= 2
print ("Doubled each element of original array:", a)
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)

Output
Addition
[[ 8 10 12]
[14 16 18]]
Subtraction
[[-6 -6 -6]
[-6 -6 -6]]
Multiplication
[[ 7 16 27]
[40 55 72]]
Division
[[7. 4. 3. ]
[2.5 2.2 2. ]]
[[ 1 256 19683]
[ 1048576 48828125 -2118184960]]

Adding 1 to every element: [2 3 6 4]


Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]

Page 22
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Original array:

[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]

Example 3: Sorting array: There is a simple np.sort method for sorting NumPy
arrays. Let’s explore it a bit.
Program:
import numpy as np
a = np.array([[1, 4, 2],[3, 4, 6],[0, -1, 5]])

# sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))

# sort array row-wise


print ("Row-wise sorted array:\n", np.sort(a, axis = 1))

# specify sort algorithm


print ("Column wise sort by applying merge-sort:\n", np.sort(a, axis = 0, kind = 'mergesort'))

# Example to show sorting of structured


array # set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7),
('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]

# Creating array
arr = np.array(values, dtype = dtypes)
Page 23
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

print ("\nArray sorted by names:\n",

Page 24
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

np.sort(arr, order = 'name'))


print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order = ['grad_year',
'cgpa']))

OUTPUT
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:

[[ 2 4]
1
[ 3 4 6]
[- 0 5]]
1
Column wise sort by applying
merge-sort: [[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:


[('Aakash', 2009, 9.0) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Pankaj', 2008, 7.9)]
Array sorted by graduation year and then cgpa:
[('Pankaj', 2008, 7.9) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Aakash', 2009, 9.0)]

Result:

Thus the python programs are written and executed to explain the features of NumPy
array.

Page 25
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.3 WORKING WITH PANDAS DATA FRAMES

Aim:
Write a python program to work with Panda data frames.
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a
Python
package that offers various data structures and operations for manipulating
numerical data and time series. It is mainly popular for importing and analyzing
data much easier. Pandas is fast and it has high-performance & productivity for
users.
Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file, and Excel
file. Pandas DataFrame can be created from the lists, dictionary, and from a list
of dictionary etc.
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. Pandas DataFrame consists of three principal components, the data,
rows, and columns.

Page 26
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Creating a Panda Data Frames


A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

Creating an empty dataframe :


A basic DataFrame, which can be created is an Empty Dataframe. An Empty
Dataframe is created just by calling a dataframe constructor.

Creating a dataframe using List:

DataFrame can be created using a single list or a list of lists.


Creating dataframe from dict of ndarray/lists:

To create dataframe from dict of narray/list, all the narray must be of same length.
If index is passed then the length index should be equal to the length of arrays. If
no index is passed, then by default, index will be range(n) where n is the array
length.
Iterating over rows :

In order to iterate over rows, we can use three function iteritems(), iterrows(),
itertuples() . These three function will help in iteration over rows.

Program

import pandas as pd

# Calling DataFrame constructor


print("Empty dataframe")
df = pd.DataFrame()
print(df)
print("Dataframe creation using List")
# list of strings

Page 27
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)

print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

# Create dataframe

df = pd.DataFrame(Data)

# Print the output.

print(df)
print("Create dataframe from dictionoary of lists")
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech",
"MBA"], 'Score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)
print(df)
# iterating over rows using iterrows()
function for i, j in df.iterrows():
print(i, j)
print()

Page 28
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT
Empty
dataframe
Empty
DataFrame
Columns: []
Index: []

Dataframe creation using


List 0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
6 Geeks
Name
Age
0 Tom 20
1 nick
21 2 krish
19
3 jack 18

Create dataframe from dictionoary


of lists name Degree Score
0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku MBA 98

0 name aparna
Degree MBA
Score 90
Name: 0, dtype: object
Page 29
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

1 name pankaj
Degree BCA
Score 40
Name: 1, dtype: object

2 name sudhir
Degree
M.Tech Score
80
Name: 2, dtype: object

3 name Geeku
Degree MBA
Score 98
Name: 3, dtype: object

Pandas Dataframe
visualization Retrieving
data from the web

In[1]:

import pandas as
pd url =

'https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total_v2.xls
x?raw=True' df = pd.read_excel(url)
df

Page 30
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT:

Pandas for retrieving data from the csv file


In[2]:
import pandas as pd

data = pd.read_csv(r'C:\Users\HI\Downloads\PythonDataScienceHandbook-
master\notebooks\data\iris.csv')
df =

pd.DataFrame(data)

print (df)

Page 31
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT:

Result:
Thus the python program is written to show the working of pandas dataframes

Page 32
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 33
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.4 BASIC PLOTS USING MATPLOTLIB

Aim: Write a python program for implementing basic plots using Matplotlib.
Matplotlib
It is a data visualization library in Python.
The pyplot, a sublibrary of matplotlib, is a collection of functions that helps
in creating a variety of charts.
Line charts
These are used to represent the relation between two data X and Y on a
different axis.
First import Matplotlib.pyplot library for plotting functions. Also, import the
Numpy library as per requirement. Then define data values x and y.
Exapmle 4a) : Python program for Simple line plot with labels and title

Example 4b : Program to plot two lines on the same graph


Page 34
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Page 35
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Example 4c : Python Program for bar chart

Page 36
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Page 37
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Example 4d : Python Program for bar chart

OUTPUT

Result
Thus the python program for implemention of basic plots using
Matplotlib was done and executed successfully

Page 38
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.5 NORMAL CURVES AND CORRELATION AND SCATTER


PLOTS

Aim : To implement python programs for Normal curves and correction and
scatter plots using different python libraries.

Prerequisites:
• Matplotlib
• Numpy
• Scipy
• Statistics

5a) Normal Curves


Algorithm
• Step1: Import the needed module
• Step2: Create data points
• Step3: Calculate the mean and standard deviation
• Step3: Calculate normal probability density
• Step4: Plot using above calculated values
• Step5: Display plot

Program

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics

# Plot between -10 and 10 with .001 steps.


x_axis = np.arange(-20, 20, 0.01)

# Calculating mean and standard deviation


mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)

Page 39
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))


plt.show()
Output

5b) Correlation and scatter plots

Algorithms
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph using scatter plot.
Step 4: Add the title and labelling

Program
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
# plotting the data
plt.scatter(x, y)

# This will fit the best line into the graph


plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
# adds the title
plt.title('Correlation')

# plot the data


Page 40
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

plt.scatter(x, y)

# fits the best fitting line to the data

plt.plot(np.unique(x),
np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')

# Labelling axes
plt.xlabel('x axis')
plt.ylabel('y axix')
Output

Result:
Page 41
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python programs for Normal curves , correction and scatter plots
using different python libraries was implemented and verified successfully.

Ex.No.6 REGRESSION

Aim : To implement python programs for Regression using python libraries.

Linear Regression: - Simple linear regression is an approach for predicting


a response using
a single feature.
- It is assumed that the two variables are linearly related.
- Find a linear function that predicts the response value(y) as
accurately as
- possible as a function of the feature or independent variable(x).
Algorithm
Step 1: Importing the libraries.
Step 2: Assign the mean for x and y.
Step 3: Calculating cross-deviation and deviation about x
Step 4: Calculating regression coefficients
Step 5: Plot the actual points as scatter plot
Step 6: Predict response vector
Step 7: plot the regression line
Step 8: Assign the labels for x and y
Step 9: Assign the values for observations / data
Step 10: Estimating the coefficients
Step 11: Plot the Regression line
Program
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

Page 42
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
Page 43
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output

Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

Result:
Page 44
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python code for Regression using various python libraries was i
mplemented and verified successfully.

Ex.No.7 Z-TEST

Aim : To implement python programs for Z-test.

Algorithm
Step 1: Import the libraries
Step 2: Generate a random array of 50 numbers having mean 110 and sd 15
Step 3: Print mean and sd
Step 4: Perform the test.
Step 5: Pass the mean value in the null hypothesis, in alternative hypothesis we
check whether the mean is larger

Step 6: Assign the function outputs a p_value and z-score corresponding to that
value, then

compare the p-value with alpha, if it is greater than alpha then we do not
null hypothesis

Step 7: Else we reject it.

Program
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15


# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
Page 45
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check
whether the
# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')


# the function outputs a p_value and z-score corresponding to that value, we compare
the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.

if(p_value < alpha):


print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")

Output
mean=110.22 stdv=2.37
Reject Null Hypothesis

Result:

Page 46
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Thus the python code for Z- test was implemented and verified successfully.

Ex.No.8 T-TEST

Aim : To implement python programs for T test.

Algorithm

Step 1: Import the library


Step 2: Create the data groups
Step 3: Perform the two sample t-test with equal variances

Program
# Python program to demonstrate how to
# perform two sample T-test
# Import the library
import scipy.stats as stats
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)

Page 47
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output

Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)

Result:

Thus the python code for T- test was implemented and verified successfully.

Ex.No.9 ANOVA

Aim: To implement a python code for ANOVA

Algorithm

Steps 1: Import python library packages

Step 2: Create the data groups

Step 3: Conduct the one-way ANOVA

Program

# Importing library
from scipy.stats import f_oneway

# Performance when each of the engine


# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]

# Conduct the one-way ANOVA


f_oneway(performance1, performance2, performance3, performance4)

Page 48
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Output

F_onewayResult(statistic=4.625000000000002, pvalue=0.01633645983978022)

Result:

Thus the python code for ANOVA was implemented and verified successfully.

Ex.No.10 BUILDING AND VALIDATING LINEAR MODEL

Aim: To implement a linear regression model using python libraries .

Algorithm:

Step1: Import the packages and classes that you need.

Step2: Provide data to work with, and eventually do appropriate transformations.

Step3: Create a regression model and fit it with existing data.

Step4: Check the results of model fitting to know whether the model is
satisfactory.

Step5: Apply the model for predictions.

Page 49
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Program

Page 50
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Result: Thus the python program for linear regression model using python libraries
was implemented successfully.

Page 51
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

Ex.No.11 BUILDING AND VALIDATING LOGISTIC MODEL

Aim: To implement a logistic regression model using python libraries .

Algorithm

Step 1: Import the required modules

Step 2: Generate the dataset

Step 3: Visualize the Data

Step 4: Split the Dataset

Step 5: Perform Logistic Regression

Page 52
AD 3411 DATA SCIENCE AND ANALYTICS LAB MANUAL

OUTPUT

Result: Thus the python program for logistic regression model using python
libraries was implemented successfully.

Page 53

You might also like