0% found this document useful (0 votes)

542 views85 pages

Data Science Lab Manual

It is based on research Oriented documents

Uploaded by

harinikadevi9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

542 views85 pages

Data Science Lab Manual

It is based on research Oriented documents

Uploaded by

harinikadevi9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LABORATORY MANUAL

Sub.Code : CS3361
Sub.Name : DATA SCIENCE LABORATORY
Regulation : R2021

Prepared By, Approved By,

Ms.G.Aninthitha Dr.C.Callins Christiyana
Assistant Professor/CSE Professor & Head /CSE

CS3362 Data Science Laboratory 1

INSTITUTE VISION
To become a centre of excellence in preparing engineering with excellent technical, scientific
research and entrepreneurial abilities to contribute to the society.

INSTITUTE MISSION

1 Providing comprehensive learning environment

2 Imparting state-of-the-art technology to fulfil the needs of the students and Industry

3 Establishing Industry-Institute alliance for bilateral benefits

4 Promoting Research and Development activities

Offering student lead activities to inculcate ethics, social responsibilities,

5
entrepreneurial, and leadership skills

DEPARTMENT VISION
To become a centre of excellence in technical education and scientific research in the field of
Computer Science and Engineering for the wellbeing of the society.

DEPARTMENT MISSION
Producing graduates with a strong theoretical and practical in computer technology
1
to meet the Industry expectation.
Offering holistic learning ambience for faculty and students to investigate, apply and
2
transfer knowledge.
Inculcating interpersonal traits among the students leading to employability and
3
entrepreneurship.
4 Establishing effective linkage with the Industries for the mutual benefits
Strengthening Research activities to solve the problems related to industry and
5
society.

SYLLABUS
CS3362 Data Science Laboratory 2
COURSE COURSE NAME L T P C
CODE
CS3361 DATA SCIENCE LABORATORY 0 0 4 2
COURSE OBJECTIVES :
● To understand the python libraries for data science
● To understand the basic Statistical and Probability measures for data science.
● To learn descriptive analytics on the benchmark data sets.
● To apply correlation and regression analytics on standard data sets.
● To present and interpret data using visualization packages in Python

EXPERIMENTS
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
Descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
TOTAL: 60 Periods

COURSE OUTCOMES:
On completion of the course, students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CS3362 Data Science Laboratory 3
CO5: Present and interpret data using visualization packages in Python.

EQUIPMENT / SOFTWARE AND HARDWARE REQUIREMENT

 INTEL based desktop PC with min. 8GB RAM and 500 GB HDD, 17” or higher TFT Monitor, Keyboard
and mouse 30
 Windows 10 or higher operating system / Linux Ubuntu 20 or higher 30
 Python 3.9 or later, Anaconda Distribution
 Scipy, statmodels, seaborn, plotly

PLAN OF IMPLEMENTATION

Number
Cumulat
Sl. of
List of Experiments ive Requirement
No periods
periods
planned
Intel desktop
(8GB RAM,
500GB HDD,
Download, install and explore the features of
17”+ TFT),
NumPy, SciPy, Jupyter, Statsmodels and Windows
1. 8 8 10/Linux Ubuntu,
Pandas packages.
Python 3.9
(Anaconda),
Scipy, statmodels,
seaborn, plotly.

a. Working with Numpy arrays.

Intel desktop
b. NumPy program to convert a list of
(8GB RAM,
numeric values into a one-dimensional 500GB HDD,
17”+ TFT),
NumPy array.
8 Windows
2. 16
c. NumPy program to create an array with 10/Linux Ubuntu,
Python 3.9
values ranging from 12 to 38.
(Anaconda),
d. NumPy program to reverse an array (the first Scipy, statmodels,
seaborn, plotly.
element becomes the last).

CS3362 Data Science Laboratory 4

Number
Cumulat
Sl. of
List of Experiments ive Requirement
No periods
periods
planned
Intel desktop
Working with Pandas data frames (8GB RAM,
a. Working with Pandas data frames 500GB HDD,
b. Pandas program to add, subtract, multiple 17”+ TFT),
and divide two Pandas Series. 8 Windows
3. 24
c. Pandas program to convert a NumPy array 10/Linux Ubuntu,
to a Pandas series. Python 3.9
d. Pandas program to create a dataframe from (Anaconda),
a dictionary and display it. Scipy, statmodels,
seaborn, plotly.
Intel desktop
(8GB RAM,
500GB HDD,
17”+ TFT),
Perform Descriptive analytics on the Iris data
8 Windows
4. 36
set. 10/Linux Ubuntu,
Python 3.9
(Anaconda),
Scipy, statmodels,
seaborn, plotly.
Use the diabetes data set from UCI and Pima
Indians Diabetes data set to perform the
following:
a. Univariate analysis: Frequency, Intel desktop
(8GB RAM,
Mean, Median, Mode, Variance, Standard 500GB HDD,
Deviation, 17”+ TFT),
Windows
5. Skewness and Kurtosis. 8 44 10/Linux Ubuntu,
b. Bivariate analysis: Linear and Python 3.9
(Anaconda),
logistic regression modeling Scipy, statmodels,
c. Multiple Regression analysis seaborn, plotly.

d. Also compare the results of the above

analysis for the two data sets.

Apply and explore various plotting functions Intel desktop

(8GB RAM,
6. on UCI data sets. 8 52 500GB HDD,
a. Normal curves 17”+ TFT),
Windows

CS3362 Data Science Laboratory 5

Number
Cumulat
Sl. of
List of Experiments ive Requirement
No periods
periods
planned
b. Density and contour plots 10/Linux Ubuntu,
Python 3.9
c. Correlation and scatter plots
(Anaconda),
d. Histograms Scipy, statmodels,
seaborn, plotly.
e. Three dimensional plotting

Intel desktop
(8GB RAM,
500GB HDD,
17”+ TFT),
Visualizing the Geographic Data with Windows
7. Basemap using Zomato geographic data. 8 60 10/Linux Ubuntu,
Python 3.9
(Anaconda),
Scipy, statmodels,
seaborn, plotly.

1(a). Download and install the different packages like NumPy, SciPy,
Jupyter, Statsmodels and Pandas

AIM:
To learn how to download and install the different packages of NumPy, SciPy, Jupyter,
Statsmodels and Pandas.

ALGORITHM:
1. Download Python and Jupyter.
2. Install Python and Jupyter.
3. Install the pack like NumPy, SciPy Satsmodels and Pandas.
4. Verify the proper execution of Python and Jupyter.

Python Installation
 Open the python official web site. (https://www.python.org/)
 Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or
above versions)
 Install "python-3.10.6-amd64.exe"

Jupyter Installation
 Open command prompt and enter the following to check whether the pyton was
installed properly or not, “python –version”.
 If installation is proper it returns the version of python
CS3362 Data Science Laboratory 6
 Enter the following to check whether the pyton package manager was installed
properly or not, “pip –version”
 If installation is proper it returns the version of python package manager
 Enter the following command “pip install jupyterlab”.
 Enter the following command “pip install jupyter notebook”.
 Copy the above command result from path to upgrade command and paste it and
execute for upgrade process.
 Create a folder and name the folder accordingly.
 Open command prompt and enter in to that folder. Enter the following code
“jupyter notebook” and then give enter.
 Now new jupyter notebook will be opened for our use.

pip Installation
Installation of NumPy
 pip install numpy
Installation of SciPy
 pip install scipy
Installation of Statsmodels
 pip install statsmodels
Installation of Pandas
 pip install pandas

Output

CS3362 Data Science Laboratory 7

RESULT:
NumPy, SciPy, Jupyter, Statsmodels and Pandas packages were installed properly and
the execution also verified.
1(b). Explore the features of NumPy

AIM:
To learn the different features provided by NumPy package.

ALGORITHM:
1. Install the NumPy package
2. Study all the features of NumPy package.

NumPy
 NumPy is a Python library used for working with arrays.
 It also has functions for working in domain of linear algebra, fourier transform,
and matrices.

Features
These are the important features of NumPy
1. Array 2. Random 3. Universal Functions

1. Arrays
Array Slicing
 Slicing in python means taking elements from one given index to another
given index.
 We pass slice instead of index like this: [start:end].
 We can also define the step, like this: [start:end:step].
 If we don't pass start its considered 0
 If we don't pass end its considered length of array in that dimension
 If we don't pass step its considered 1

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])print(arr[1:5:2])

Array Shape & Reshaping

Array Shape
NumPy arrays have an attribute called shape that returns a tuple with eachindex
having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

CS3362 Data Science Laboratory 8

Array Reshaping
 Reshaping means changing the shape of an array.
 The shape of an array is the number of elements in each dimension.
 By reshaping we can add or remove dimensions or change number of
elements in each dimension.
 Convert the following 1-D array with 12 elements into a 3-D array.
The outermost dimension will have 2 arrays that contains 3 arrays, each with 2elements:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)

2. Random
Random Permutations
A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of
[1, 2, 3] and vice-versa.
The NumPy Random module provides two methods for this: shuffle() and
permutation().
from numpy import random
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
random.shuffle(arr) print(arr)

Seaborn
Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used to
visualize random distributions.
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot([0, 1, 2, 3, 4, 5])
plt.show()

Normal (Gaussian) Distribution

It is also called the Gaussian Distribution after the German mathematician Carl
Friedrich Gauss. It fits the probability distribution of many events, eg. IQ Scores, Heartbeat
etc.
It uses the random.normal() method to get a Normal Data Distribution.
It has three parameters:
loc - (Mean) where the peak of the bell exists.
scale - (Standard Deviation) how flat the graph distribution should be.
size - The shape of the returned array.
Generate a random normal distribution of size 2x3 with mean at 1 and standarddeviation of
2:
from numpy import random
x = random.normal(loc=1, scale=2, size=(2, 3))
print(x)

CS3362 Data Science Laboratory 9

Binomial Distribution
Binomial Distribution is a Discrete Distribution.
It describes the outcome of binary scenarios, e.g. toss of a coin, it will either be head
or tails.
It has three parameters:
n - number of trials.
p - probability of occurence of each trial (e.g. for toss of a coin 0.5 each).
size - The shape of the returned array.
Given 10 trials for coin toss generate 10 data points:
from numpy import random
x = random.binomial(n=10, p=0.5, size=10)
print(x)

Poisson Distribution
It estimates how many times an event can happen in a specified time. e.g. If someone
eats twice a day what is probability he will eat thrice?
It has two parameters:
lam - rate or known number of occurences e.g. 2 for above problem.
size - The shape of the returned array.
Generate a random 1x10 distribution for occurence 2:
from numpy import random
x = random.poisson(lam=2, size=10)
print(x)

Uniform Distribution
Used to describe probability where every event has equal chances of occuring. E.g.
Generation of random numbers.
It has three parameters:
a - lower bound - default 0 .0.
b - upper bound - default 1.0.
size - The shape of the returned array.
Create a 2x3 uniform distribution sample:
from numpy import random
x = random.uniform(size=(2, 3))
print(x)

Logistic Distribution
Logistic Distribution is used to describe growth.
Used extensively in machine learning in logistic regression, neural networks etc.
It has three parameters:
loc - mean, where the peak is. Default 0.
scale - standard deviation, the flatness of distribution. Default 1.
size - The shape of the returned array.

CS3362 Data Science Laboratory 1

0
Draw 2x3 samples from a logistic distribution with mean at 1 and stddev 2.0:from
numpy import random
x = random.logistic(loc=1, scale=2, size=(2, 3))
print(x)

Multinomial Distribution
Multinomial distribution is a generalization of binomial distribution.
It describes outcomes of multi-nomial scenarios unlike binomial where scenarios must
be only one of two. e.g. Blood type of a population, dice roll outcome.
It has three parameters:
n - number of possible outcomes (e.g. 6 for dice roll).
pvals - list of probabilties of outcomes (e.g. [1/6, 1/6, 1/6, 1/6, 1/6, 1/6] for dice roll).
size - The shape of the returned array.
Draw out a sample for dice roll:
from numpy import random
x = random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
print(x)

Exponential Distribution
Exponential distribution is used for describing time till next event e.g. failure/success
etc.

It has two parameters:

scale - inverse of rate ( see lam in poisson distribution ) defaults to 1.0.
size - The shape of the returned array.
Draw out a sample for exponential distribution with 2.0 scale with 2x3 size:from
numpy import random
x = random.exponential(scale=2, size=(2, 3))
print(x)

Chi Square Distribution

Chi Square distribution is used as a basis to verify the hypothesis.
It has two parameters:
df - (degree of freedom).
size - The shape of the returned array.
Draw out a sample for chi squared distribution with degree of freedom 2 with size 2x3:from
numpy import random
x = random.chisquare(df=2, size=(2, 3))
print(x)

Rayleigh Distribution
Rayleigh distribution is used in signal processing.
It has two parameters:
scale - (standard deviation) decides how flat the distribution will be default 1.0).
size - The shape of the returned array.

CS3362 Data Science Laboratory 1

1
Draw out a sample for rayleigh distribution with scale of 2 with size 2x3:from
numpy import random
x = random.rayleigh(scale=2, size=(2, 3))
print(x)

Pareto Distribution
A distribution following Pareto's law i.e. 80-20 distribution (20% factors cause80%
outcome).
It has two parameter:
a - shape parameter.
size - The shape of the returned array.
Draw out a sample for pareto distribution with shape of 2 with size 2x3:from
numpy import random
x = random.pareto(a=2, size=(2, 3))
print(x)

Zipf Distribution
Zipf distritutions are used to sample data based on zipf's law.
Zipf's Law: In a collection the nth common term is 1/n times of the most common
term. E.g. 5th common word in english has occurs nearly 1/5th times as of the most
used word.
It has two parameters:
a - distribution parameter.
size - The shape of the returned array.
Draw out a sample for zipf distribution with distribution parameter 2 with size 2x3:from
numpy import random
x = random.zipf(a=2, size=(2, 3))
print(x)

3. Universal Functions
Create Your Own ufunc (Universal)
To create you own ufunc, you have to define a function, like you do with normal
functions in Python, then you add it to your NumPy ufunc library with the frompyfunc()
method.
The frompyfunc() method takes the following arguments:
function - the name of the function.
inputs - the number of input arguments (arrays).
outputs - the number of output arrays.
Create your own ufunc for addition:
import numpy as np
def myadd(x, y):
return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

CS3362 Data Science Laboratory 1

2
Simple Arithmetic
You could use arithmetic operators + - * / directly between NumPy arrays, but this
section discusses an extension of the same where we have functions that can take any array-
like objects e.g. lists, tuples etc. and perform arithmetic conditionally.
Addition
Add the values in arr1 to the values in arr2:
import numpy as np
arr1 = np.array([10, 11, 12, 13, 14, 15])
arr2 = np.array([20, 21, 22, 23, 24, 25])
newarr = np.add(arr1, arr2)
print(newarr)
Subtraction
Subtract the values in arr2 from the values in arr1:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([20, 21, 22, 23, 24, 25])
newarr = np.subtract(arr1, arr2)
print(newarr)
Multiplication
Multiply the values in arr1 with the values in arr2:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([20, 21, 22, 23, 24, 25])
newarr = np.multiply(arr1, arr2)
print(newarr)
Division
Divide the values in arr1 with the values in arr2:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 5, 10, 8, 2, 33])
newarr = np.divide(arr1, arr2)
print(newarr)
Power
Raise the valules in arr1 to the power of values in arr2:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 5, 6, 8, 2, 33])
newarr = np.power(arr1, arr2)
print(newarr)
Remainder
Return the remainders:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])

CS3362 Data Science Laboratory 13

arr2 = np.array([3, 7, 9, 8, 2, 33])
newarr = np.mod(arr1, arr2)
print(newarr)
Absolute Values
Return the quotient and mod:
import numpy as np
arr = np.array([-1, -2, 1, 2, 3, -4])
newarr = np.absolute(arr)
print(newarr)

Rounding Decimals
There are primarily five ways of rounding off decimals in NumPy:
 truncation  floor
 rounding  ceil
Truncation

Remove the decimals, and return the float number closest to zero. Use the trunc() and
fix() functions.
Truncate elements of following array:
import numpy as np
arr = np.trunc([-3.1666, 3.6667])
print(arr)

Rounding
The around() function increments preceding digit or decimal by 1 if >=5 else do
nothing.
Round off 3.1666 to 2 decimal places:
import numpy as np
arr = np.around(3.1666, 2)
print(arr)

Floor
The floor() function rounds off decimal to nearest lower integer.
Floor the elements of following array:
import numpy as np
arr = np.floor([-3.1666, 3.6667])
print(arr)

Ceil
The ceil() function rounds off decimal to nearest upper integer.
Ceil the elements of following array:
import numpy as np
arr = np.ceil([-3.1666, 3.6667])
print(arr)

CS3362 Data Science Laboratory 14

Logs
NumPy provides functions to perform log at the base 2, e and 10.
We will also explore how we can take log for any base by creating a custom ufunc.
All of the log functions will place -inf or inf in the elements if the log can not be
computed.
Find log at base 10 of all elements of following array:
import numpy as np
arr = np.arange(1, 10)
print(np.log10(arr))

Summations
Addition is done between two arguments whereas summation happens over n
elements
Add the values in arr1 to the values in arr2:
import numpy as np arr1
= np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])
newarr = np.add(arr1, arr2)
print(newarr)

Products
To find the product of the elements in an array, use the prod() function.
Find the product of the elements of this array:
import numpy as np
arr = np.array([1, 2, 3, 4])x
= np.prod(arr)
print(x)

Differences
A discrete difference means subtracting two successive elements.
To find the discrete difference, use the diff() function.
Compute discrete difference of the following array:
import numpy as np
arr = np.array([10, 15, 25, 5])
newarr = np.diff(arr)
print(newarr)

LCM (Lowest Common Multiple

The Lowest Common Multiple is the least number that is common multiple of both of
the numbers.
import numpy as np
num1 = 4
num2 = 6
x = np.lcm(num1, num2)

CS3362 Data Science Laboratory 15

print(x)

GCD (Greatest Common Denominator)

The GCD (Greatest Common Denominator), also known as HCF (Highest Common
Factor) is the biggest number that is a common factor of both of the numbers.
Find the HCF of the following two numbers:
import numpy as np
num1 = 6
num2 = 9

x = np.gcd(num1, num2)
print(x)

Trigonometric Functions
NumPy provides the ufuncs sin(), cos() and tan() that take values in radians and
produce the corresponding sin, cos and tan values.
Find sine value of PI/2:
import numpy as npx
= np.sin(np.pi/2)
print(x)

Find sine values for all of the values in arr:

import numpy as np
arr = np.array([np.pi/2, np.pi/3, np.pi/4, np.pi/5])x =
np.sin(arr)
print(x)

Hyperbolic Functions
NumPy provides the ufuncs sinh(), cosh() and tanh() that take values in radians and
produce the corresponding sinh, cosh and tanh values..
Find sinh value of PI/2:
import numpy as npx
= np.sinh(np.pi/2)
print(x)

Find cosh values for all of the values in arr:

import numpy as np
arr = np.array([np.pi/2, np.pi/3, np.pi/4, np.pi/5])x =
np.cosh(arr)
print(x)

Set Operations
A set in mathematics is a collection of unique elements.

CS3362 Data Science Laboratory 16

Create Sets in NumPy
We can use NumPy's unique() method to find unique elements from any array. E.g.
create a set array, but remember that the set arrays should only be 1-D arrays.
Convert following array with repeated elements to a set:
import numpy as np
arr = np.array([1, 1, 1, 2, 3, 4, 5, 5, 6, 7])
x = np.unique(arr)
print(x)
Finding Union
To find the unique values of two arrays, use the union1d() method.
Find union of the following two set arrays:
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6]) newarr
= np.union1d(arr1, arr2)
print(newarr)

Finding Intersection
To find only the values that are present in both arrays, use the intersect1d() method.
Find intersection of the following two set arrays:
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6])
newarr = np.intersect1d(arr1, arr2, assume_unique=True)
print(newarr)

Output :

CS3362 Data Science Laboratory 17

Result:
Thus the Features of Numpy has been executed successfully

RESULT
Thus the feature study of NumPy was completed successfully.

CS3362 Data Science Laboratory 18

1(c). Explore the features of SciPy

AIM:
To learn the different features provided by SciPy package.

ALGORITHM:
1. Install the SciPy package
2. Study all the features of SciPy package.

SciPy
SciPy stands for Scientific Python, SciPy is a scientific computation library that uses
NumPy underneath.

Features
These are the important features of SciPy
1. Constants 2. Sparse Data 3. Graphs
4. Spatial Data 5. Matlab Arrays 6. Interpolation

1. Constants in SciPy
As SciPy is more focused on scientific implementations, it provides many built-in
scientific constants.
These constants can be helpful when you are working with Data Science.
Constants
in SciPy
Metric
Return the specified unit in meter
ex: print(constants.milli)
Binary
Return the specified unit in bytes
ex: print(constants.kibi)
Mass
Return the specified unit in kg
ex: print(constants.stone)
Angle
Return the specified unit in radians
ex: print(constants.degree)
Time
Return the specified unit in seconds
ex: print(constants.year)
Length
Return the specified unit in meters
ex: print(constants.mile)
Pressure
Return the specified unit in pascals
ex: print(constants.bar)

17
Area
Return the specified unit in square meters
ex: print(constants.hectare)
Volume
Return the specified unit in cubic meters
ex: print(constants.litre)
Speed
Return the specified unit in meters per second
ex: print(constants.kmh)
Temperature
Return the specified unit in Kelvin
ex: print(constants.zero_Celsius)
Energy
Return the specified unit in joules
ex: print(constants.calorie)
Power
Return the specified unit in watts
ex: print(constants.hp)
Force
Return the specified unit in newton
ex: print(constants.pound_force)

2. Sparse Data
Sparse data is data that has mostly unused elements (elements that don't carry any
information).
It can be an array like this one:
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]
Sparse Data: is a data set where most of the item values are zero.
Dense Array: is the opposite of a sparse array: most of the values are not zero.

CSR(Compressed Sparse Row) Matrix

We can create CSR matrix by passing an arrray into function
scipy.sparse.csr_matrix().
Create a CSR matrix from an array:
import numpy as np
from scipy.sparse import csr_matrix arr
= np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
print(csr_matrix(arr))

3. Graphs
Graphs are an essential data structure.
SciPy provides us with the module scipy.sparse.csgraph for working with such data
structures.

CS3362 Data Science Laborato ry 18

Adjacency Matrix
Adjacency matrix is a nxn matrix where n is the number of elements in a graph.
The values represents the connection between the elements.

Dijkstra
Use the dijkstra method to find the shortest path in a graph from one element to
another.
It takes following arguments:
return_predecessors: boolean (True to return whole path of traversal otherwise False).
indices: index of the element to return all paths from that element only.
limit: max weight of path.
Find the shortest path from element 1 to 2:
import numpy as np
from scipy.sparse.csgraph import dijkstra
from scipy.sparse import csr_matrix
arr = np.array([
[0, 1, 2],
[1, 0, 0],
[2, 0, 0]
])
newarr = csr_matrix(arr)
print(dijkstra(newarr, return_predecessors=True, indices=0))

Depth First Order

The depth_first_order() method returns a depth first traversal from a node.
This function takes following arguments:
the graph.
the starting element to traverse graph from.

Traverse the graph depth first for given adjacency matrix:import

numpy as np
from scipy.sparse.csgraph import depth_first_order
from scipy.sparse import csr_matrix
arr = np.array([
[0, 1, 0, 1],
[1, 1, 1, 1],
[2, 1, 1, 0],
[0, 1, 0, 1]
])
newarr = csr_matrix(arr)
print(depth_first_order(newarr, 1))

CS3362 Data Science Laborato ry 19

Breadth First Order
The breadth_first_order() method returns a breadth first traversal from a node.
This function takes following arguments:
the graph.
the starting element to traverse graph from.
Traverse the graph breadth first for given adjacency matrix:import
numpy as np
from scipy.sparse.csgraph import breadth_first_order
from scipy.sparse import csr_matrix
arr = np.array([
[0, 1, 0, 1],
[1, 1, 1, 1],
[2, 1, 1, 0],
[0, 1, 0, 1]
])
newarr = csr_matrix(arr)
print(breadth_first_order(newarr, 1))

4. Spatial Data
Spatial data refers to data that is represented in a geometric space.
E.g. points on a coordinate system.
We deal with spatial data problems on many tasks.
E.g. finding if a point is inside a boundary or not.

Triangulation
A Triangulation of a polygon is to divide the polygon into multiple triangles with
which we can compute an area of the polygon.
A Triangulation with points means creating surface composed triangles in which all of
the given points are on at least one vertex of any triangle in the surface.
One method to generate these triangulations through points is the Delaunay()
Triangulation.
Example:
Create a triangulation from following points:
import numpy as np
from scipy.spatial import Delaunay
import matplotlib.pyplot as plt points
= np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1]
])
simplices = Delaunay(points).simplices

CS3362 Data Science Laboratory 20

plt.triplot(points[:, 0], points[:, 1], simplices)
plt.scatter(points[:, 0], points[:, 1], color='r')
plt.show()

Convex Hull
A convex hull is the smallest polygon that covers all of the given points.
Use the ConvexHull() method to create a Convex Hull.
Example
Create a convex hull for following points:
import numpy as np
from scipy.spatial import ConvexHull
import matplotlib.pyplot as plt
points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1],
[1, 2],
[5, 0],
[3, 1],
[1, 2],
[0, 2] ])
hull = ConvexHull(points)
hull_points = hull.simplices
plt.scatter(points[:,0], points[:,1])
for simplex in hull_points:
plt.plot(points[simplex,0], points[simplex,1], 'k-')
plt.show()

KDTrees
KDTrees are a datastructure optimized for nearest neighbor queries.
E.g. in a set of points using KDTrees we can efficiently ask which points are nearest
to a certain given point.
The KDTree() method returns a KDTree object.
The query() method returns the distance to the nearest neighbor and the location of the
neighbors.
Example
Find the nearest neighbor to point (1,1):
from scipy.spatial import KDTree
points = [(1, -1), (2, 3), (-2, 3), (2, -3)]
kdtree = KDTree(points)
res = kdtree.query((1, 1))
print(res)

CS3362 Data Science Laboratory 21

Distance Matrix
There are many Distance Metrics used to find various types of distances between two
points in data science, Euclidean distsance, cosine distsance etc.
The distance between two vectors may not only be the length of straight line between
them, it can also be the angle between them from origin, or number of unit steps required etc.
Many of the Machine Learning algorithm's performance depends greatly on distance
metrices. E.g. "K Nearest Neighbors", or "K Means" etc.
Let us look at some of the Distance Metrices:

Euclidean Distance
Find the euclidean distance between given points A and B.
Example
Find the euclidean distance between given points.
from scipy.spatial.distance import euclideanp1
= (1, 0)
p2 = (10, 2)
res = euclidean(p1, p2)
print(res)

Cosine Distance
Is the value of cosine angle between the two points A and B.
Example
Find the cosine distsance between given points:
from scipy.spatial.distance import cosinep1
= (1, 0)
p2 = (10, 2)
res = cosine(p1, p2)
print(res)

Hamming Distance
Is the proportion of bits where two bits are difference.
It's a way to measure distance for binary sequences.
Example
Find the hamming distance between given points:
from scipy.spatial.distance import hammingp1
= (True, False, True)
p2 = (False, True, True)
res = hamming(p1, p2)
print(res)

5. Matlab Arrays
We know that NumPy provides us with methods to persist the data in readable
formats for Python. But SciPy provides us with interoperability with Matlab as well.

CS3362 Data Science Laboratory 22

Working With Matlab Arrays
We know that NumPy provides us with methods to persist the data in readable
formats for Python. But SciPy provides us with interoperability with Matlab as well.
Exporting Data in Matlab Format
The savemat() function allows us to export data in Matlab format.
The method takes the following parameters:
filename - the file name for saving data.
mdict - a dictionary containing the data.
do_compression - a boolean value that specifies whether to compress the
result or not. Default False.
Example
Export the following array as variable name "vec" to a mat file:from
scipy import io
import numpy as np
arr = np.arange(10)
io.savemat('arr.mat', {"vec": arr})

Import Data from Matlab Format

The loadmat() function allows us to import data from a Matlab file.
The function takes one required parameter:
filename - the file name of the saved data.
It will return a structured array whose keys are the variable names, and the
corresponding values are the variable values.
Example
Import the array from following mat file.:
from scipy import io
import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
# Export:
io.savemat('arr.mat', {"vec": arr})#
Import:
mydata = io.loadmat('arr.mat')
print(mydata)

6. Interpolation
Interpolation is a method for generating points between given points.
For example: for points 1 and 2, we may interpolate and find points 1.33 and 1.66.
Interpolation has many usage, in Machine Learning we often deal with missing data in
a dataset, interpolation is often used to substitute those values.
This method of filling values is called imputation.
Apart from imputation, interpolation is often used where we need to smooth the
discrete points in a dataset.

CS3362 Data Science Laboratory 23

1D Interpolation
The function interp1d() is used to interpolate a distribution with 1 variable.
It takes x and y points and returns a callable function that can be called with new x
and returns corresponding y.
Example
For given xs and ys interpolate values from 2.1, 2.2... to 2.9:
from scipy.interpolate import interp1d import
numpy as np
xs = np.arange(10)
ys = 2*xs + 1
interp_func = interp1d(xs, ys)
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)

Spline Interpolation
In 1D interpolation the points are fitted for a single curve whereas in Spline
interpolation the points are fitted against a piecewise function defined with polynomials called
splines.
The UnivariateSpline() function takes xs and ys and produce a callable funciton that
can be called with new xs.
Example
Find univariate spline interpolation for 2.1, 2.2 ........ 2.9 for the following non linear points:
from scipy.interpolate import UnivariateSpline
import numpy as np
xs = np.arange(10)
ys = xs**2 + np.sin(xs) + 1 interp_func
= UnivariateSpline(xs, ys)
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)

Output

CS3362 Data Science Laboratory 24

RESULT
Thus the feature study of SciPy was completed successfully.

CS3362 Data Science Laboratory 25

1(d). Explore the features of Pandas

AIM:
To learn the different features provided by Pandas package.

ALGORITHM:
1. Install the Pandas package
2. Study all the features of Pandas package.

Pandas
 Pandas is a Python library used for working with data sets.
 It has functions for analyzing, cleaning, exploring, and manipulating data.
 Pandas allows us to analyze big data and make conclusions based on statistical
theories.
 Pandas can clean messy data sets, and make them readable and relevant.

Features
These are the important features of Pandas.
1. Series 2. DataFrames 3. Read CSV
4. Read JSON 5. Viewing the Data 6. Data Cleaning
7. Plotting

1. Series
 A Pandas Series is like a column in a table.
 It is a one-dimensional array holding data of any type.
 Create a simple Pandas Series from a list:

import pandas as pda

= [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

Create Labels
With the index argument, you can name your own labels.
Example
Create you own labels:
import pandas as pda
= [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

CS3362 Data Science Laboratory 26

Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)

2. DataFrames
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a
table with rows and columns.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

3. Read CSV
A simple way to store big data sets is to use CSV files (comma separated files). CSV
files contains plain text and is a well know format that can be read by everyone
including Pandas.
Example
To print maximum rows in a CSV file
import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv') print(df)

4. Read JSON
 Big data sets are often stored, or extracted as JSON.
 JSON is plain text, but has the format of an object, and is well known in the world
of programming, including Pandas.
Load the JSON file into a DataFrame:
import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())

CS3362 Data Science Laboratory 27

5. Viewing the Data
One of the most used method for getting a quick overview of the DataFrame, is the
head() method. The head() method returns the headers and a specified number of rows, starting
from the top.

5.1 Info About the Data

The DataFrames object has a method called info(), that gives you more information
about the data set.
Example
Print information about the data:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.info())

6. Data Cleaning
Data cleaning means fixing bad data in your data set.
Bad data could be:
 Empty cells
 Data in wrong format
 Wrong data
 Duplicates

Empty Cells
Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will not
have a big impact on the result.
Example
Return a new Data Frame with no empty cells:
import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
inplace() method
It remove all rows with NULL values:
import pandas as pd
df = pd.read_csv('data.csv')
df.dropna(inplace = True)
print(df.to_string())

Replace Empty Values

Another way of dealing with empty cells is to insert a new value instead.

CS3362 Data Science Laboratory 28

Example
Replace NULL values with the number 130:
import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)

Replace Using Mean, Median, or Mode

A common way to replace empty cells, is to calculate the mean, median or mode
value of the column.
Pandas uses the mean() median() and mode() methods to calculate the respective
values for a specified column:
mean()
import pandas as pd
df = pd.read_csv('data.csv')x
= df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)
print(df.to_string())
median()
import pandas as pd
df = pd.read_csv('data.csv')x
= df["Calories"].median()
df["Calories"].fillna(x, inplace = True)
mode()
import pandas as pd
df = pd.read_csv('data.csv') x
= df["Calories"].mode()[0]
df["Calories"].fillna(x, inplace = True)

Data of Wrong Format

Cells with data of wrong format can make it difficult, or even impossible, to analyze
data.
To fix it, you have two options: remove the rows, or convert all cells in the columns
into the same format.
Example
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())

Removing Rows
Remove rows with a NULL value in the "Date" column:
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])

CS3362 Data Science Laboratory 29

df.dropna(subset=['Date'], inplace = True)
print(df.to_string())

Fixing Wrong Data

Wrong Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be
wrong, like if someone registered "199" instead of "1.99".
Sometimes you can spot wrong data by looking at the data set, because you have an
expectation of what it should be.

Replacing Values
One way to fix wrong values is to replace them with something else.
Example
Set "Duration" = 45 in row 7:
import pandas as pd
df = pd.read_csv('data.csv')
df.loc[7,'Duration'] = 45
print(df.to_string())

Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong data.
Example
Delete rows where "Duration" is higher than 120:
import pandas as pd
df = pd.read_csv('data.csv')
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
print(df.to_string())

Removing Duplicates
Discovering Duplicates
Duplicate rows are rows that have been registered more than one time.
duplicated() method
import pandas as pd
df = pd.read_csv('data.csv')
print(df.duplicated())

Removing Duplicates
To remove duplicates, use the drop_duplicates() method.
import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace = True)
print(df.to_string())

CS3362 Data Science Laboratory 30

7. Plotting
We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on
the screen.
Pandas uses the plot() method to create diagrams.

Scatter Plot
Specify that you want a scatter plot with the kind argument:
kind = 'scatter'
Example
import sys import
matplotlib
matplotlib.use('Agg')
import pandas as pd
import matplotlib.pyplot as pltdf
= pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse')
plt.show()
plt.savefig(sys.stdout.buffer)
sys.stdout.flush()

Histogram
Use the kind argument to specify that you want a histogram:
kind = 'hist'
Example
import sys import
matplotlib
matplotlib.use('Agg')import
pandas as pd
import matplotlib.pyplot as plt df
= pd.read_csv('data.csv')
df["Duration"].plot(kind = 'hist')
plt.show()
plt.savefig(sys.stdout.buffer)
sys.stdout.flush()

CS3362 Data Science Laboratory 31

Output

CS3362 Data Science Laboratory 32

RESULT
Thus the feature study of Pandas was completed successfully.
1(e). Explore the features of statsmodels

AIM:
To learn the different features provided by statsmodels package.

ALGORITHM:
3. Install the statsmodels package
4. Study all the features of statsmodels package.

Statsmodels
statsmodels is a Python module that provides classes and functions for the estimation
of many different statistical models, as well as for conducting statistical tests, and statistical
data exploration.

Features
These are the important features of statsmodels
1. Linear regression models
2. Survival analysis

1. Linear regression models

Linear regression analysis is a statistical technique for predicting the value of one
variable(dependent variable) based on the value of another(independent variable).
In simple linear regression, there’s one independent variable used to predict a single
dependent variable. In the case of multilinear regression, there’s more than one independent
variable.
The independent variable is the one you’re using to forecast the value of the other

CS3362 Data Science Laboratory 33

variable. The statsmodels.regression.linear_model.OLS method is used to perform linear
regression. Linear equations are of the form:
Y=mX+C (m=slope; c=constant)
Syntax:
statsmodels.regression.linear_model.OLS(endog, exog=None, missing=’none’,
hasconst=None, **kwargs)
Parameters:
 endog: array like object.
 exog: array like object.
 missing: str. None, decrease, and raise are the available alternatives. If the value is
‘none,’ no nan testing is performed. Any observations with nans are dropped if ‘drop’
is selected. An error is raised if ‘raise’ is used. ‘none’ is the default.
 hasconst: None or Bool. Indicates whether a user-supplied constant is included in the
RHS. If True, k constant is set to 1 and all outcome statistics are calculated as if a
constant is present. If False, k constant is set to 0 and no constant is verified.
 **kwargs: When using the formula interface, additional arguments are utilised to set
model characteristics.

Step 1: Import packages.

Importing the required packages is the first step of modeling. The pandas, NumPy, and
stats model packages are imported.
import numpy as np
import pandas as pd
import statsmodels.api as sm
Step 2: Loading data
To access the CSV file click here. The CSV file is read using pandas.read_csv()
method. The head or the first five rows of the dataset is returned by using the head() method.
Head size and Brain weight are the columns.
df = pd.read_csv('headbrain1.csv')
df.head()
Visualizing the data:
By using the matplotlib and seaborn packages, we visualize the data. sns.regplot()
function helps us create a regression plot.
# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('headbrain1.csv')
sns.regplot('Head Size(cm^3)', 'Brain Weight(grams)', data=df)
plt.show()
Step 3: Setting a hypothesis.
Null hypothesis (H0): There is no relationship between head size and brain weight.
Alternative hypothesis (Ha): There is a relationship between head size and brain
weight.
Step 4: Fitting the model
statsmodels.regression.linear_model.OLS() method is used to get ordinary least

CS3362 Data Science Laboratory 34

squares, and fit() method is used to fit the data in it.
The ols method takes in the data and performs linear regression. we provide the
dependent and independent columns in this format :
inpendent_columns ~ dependent_column:
left side of the ~ operator contains the independent variables and right side of the
operator contains the name of the dependent variable or the predicted column.
df.columns = ['Head_size', 'Brain_weight']
model = sm.ols(formula='Head_size ~ Brain_weight', data=df).fit()
Step 5: Summary of the model.
All the summary statistics of the linear regression model are returned by the
model.summary() method. The p-value and many other values/statistics are known by this
method. Predictions about the data are found by the model.summary() method.
print(model.summary())
2. Survival analysis
The statsmodels.api.SurvfuncRight class can be used to estimate survival functions
using data that may be censored to the right. SurvfuncRight implements several inference
methods, including confidence intervals for survival quantiles, pointwise simultaneous
confidence intervals for survival functions, and plotting methods. The duration.survdiff
function provides a test procedure for comparing survival distributions.
Here we are creating a SurvfuncRight object using the data from the Moore study
available from the R dataset repository. Adjust the survival distribution for 'low' fcategory
subjects only.

Example:
# Importing libraries
import statsmodels.api as sm
X = sm.datasets.get_rdataset("Moore", "carData").data#
Filtering data of low fcategory
X = X[X['fcategory'] == "low"] #
Creating SurvfuncRight model
model = sm.SurvfuncRight(X["conformity"], X["fscore"])#
Model Summary
model.summary()

CS3362 Data Science Laboratory 35

Output

Linear regression models

36
Survival analysis

RESULT
Thus the few important features of study statsmodels was completed successfully.

Viva Questions :

1. What are the primary uses of NumPy, and how does it improve numerical computations in Python?
2. How does SciPy complement NumPy, and what additional functionalities does it provide?
3. What are the key features of Jupyter notebooks, and how do they facilitate data analysis and sharing?
4. How can Statsmodels be used for statistical modeling and hypothesis testing in Python?
5. What are the core functionalities of Pandas, and how does it simplify data manipulation and analysis?

Augmented Experiments:

1. Download and install the NumPy package. Explore its array creation and manipulation features. Perform basic
operations such as element-wise addition, multiplication, and indexing. Document the installation process and
provide examples of array manipulations.
2. Download and install the SciPy package. Explore its functionalities for scientific and technical computing, such as
optimization, integration, and signal processing. Perform an example optimization problem and document the
process and results.
3. Install Jupyter Notebook and create a notebook to document and visualize data analysis workflows. Include
examples of using Markdown, code cells, and visualizations with Matplotlib or Seaborn. Document the installation
process and provide a sample notebook.
4. Download and install the Statsmodels package. Explore its features for statistical modeling, such as linear
regression, time series analysis, and hypothesis testing. Perform a linear regression analysis on a sample dataset and
document the process and results.
5. Download and install the Pandas package. Explore its data manipulation features, such as DataFrame creation, data
indexing, filtering, and aggregation. Perform data cleaning and analysis on a sample dataset and document the
process and results.

CS3362 Data Science Laboratory 37

2.a Working with Numpy Arrays

AIM:
To work with different features provided by Numpy arrays.

ALGORITHM:
1. Install the numpy package
2. Work with all the features of numpy array.

Arrays
1. Creating Arrays
 0-D Arrays
Each value in an array is a 0-D array.
import numpy as np
arr = np.array(42)
print(arr)
 1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Example:
import numpy as npa
= np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

CS3362 Data Science Laboratory 38

2. Access Array Elements
Access 2-D Arrays
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
Access 3-D Arrays
To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

3. Array Slicing
 Slicing in python means taking elements from one given index to another given
index.
 We pass slice instead of index like this: [start:end].
 We can also define the step, like this: [start:end:step].
 If we don't pass start its considered 0
 If we don't pass end its considered length of array in that dimension
 If we don't pass step its considered 1

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])print(arr[1:5:2])

4. Data Types
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
Below is a list of all data types in NumPy and the characters used to represent them.
i - integer M - datetime
b - boolean O - object
u - unsigned integer S - string
f - float U - unicode string
c - complex float V - fixed chunk of memory
m - timedelta for other type (void)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

CS3362 Data Science Laboratory 39

5. Copy & View
Copy:
Make a copy
import numpy as np
arr = np.array([1, 2, 3, 4, 5])x
= arr.copy()
arr[0] = 42
print(arr)
print(x)

View:
Make a view
import numpy as np
arr = np.array([1, 2, 3, 4, 5])x
= arr.view()
arr[0] = 42
print(arr)
print(x)

6. Array Shape & Reshaping

Array Shape
NumPy arrays have an attribute called shape that returns a tuple with each index
having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

Array Reshaping
 Reshaping means changing the shape of an array.
 The shape of an array is the number of elements in each dimension.
 By reshaping we can add or remove dimensions or change number of elements in
each dimension.
 Convert the following 1-D array with 12 elements into a 3-D array.
 The outermost dimension will have 2 arrays that contains 3 arrays, each with 2
elements:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)

7. Array Iterating
 Iterating means going through elements one by one.

CS3362 Data Science Laboratory 40

 As we deal with multi-dimensional arrays in numpy, we can do this using basic
for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)

8. Joining Array
Joining means putting contents of two or more arrays in a single array.
import numpy as np arr1
= np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

9. Splitting Array
Splitting is reverse operation of Joining.
Joining merges multiple arrays into one and Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)

10. Searching Arrays

You can search an array for a certain value, and return the indexes that get a match.
To search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])x
= np.where(arr == 4)
print(x)

11. Sorting means putting elements in an ordered sequence.

 Ordered sequence is any sequence that has an order corresponding to elements,
like numeric or alphabetical, ascending or descending.
 The NumPy ndarray object has a function called sort(), that will sort a specified
array.
import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

12. Filtering Arrays

Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.

CS3362 Data Science Laboratory 41

If the value at an index is True that element is contained in the filtered array, if the
value at that index is False that element is excluded from the filtered array.
import numpy as np
arr = np.array([41, 42, 43, 44])x
= [True, False, True, False]
newarr = arr[x]
print(newarr)

Output :

CS3362 Data Science Laboratory 42

2b. NumPy program to convert a list of numeric values into a one-dimensional NumPy array.

Aim :

To Write a NumPy program to convert a list of numeric values into a one-dimensional NumPy array.

Program:

import numpy as np

# Creating a Python list 'l' containing floating-point numbers

l = [12.23, 13.32, 100, 36.32]

# Printing the original Python list

print("Original List:", l)

# Creating a NumPy array 'a' from the Python list 'l'

a = np.array(l)

# Printing the one-dimensional NumPy array 'a'

print("One-dimensional NumPy array: ", a)

Output:
Original List: [12.23, 13.32, 100, 36.32]

One-dimensional NumPy array: [ 12.23 13.32 100. 36.32]

CS3362 Data Science Laboratory 43

2b. NumPy program to create an array with values ranging from 12 to 38

Aim :

To write a . NumPy program to create an array with values ranging from 12 to 38.

Program:

Importing the NumPy library with an alias 'np'

import numpy as np
# Creating an array 'x' using arange() function with values from 12 to 37 (inclusive)
x = np.arange(12, 38)
# Printing the array 'x' containing values from 12 to 37
print(x)
Output:
[12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37]

2c. NumPy program to reverse an array (the first element becomes the last.

Aim:
To write a NumPy program to reverse an array (the first element becomes the last
rogram:
# Importing the NumPy library with an alias 'np'
import numpy as np
# Creating an array 'x' using arange() function with values from 12 to 37 (inclusive)
x = np.arange(12, 38)
# Printing the original array 'x' containing values from 12 to 37
print("Original array:")
print(x)
# Reversing the elements in the array 'x' and printing the reversed array
print("Reverse array:")
x = x[::-1]
print(x)
Output:
Original array:
[12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37]
Reverse array:
[37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12]

44
Viva Questions:

1. What is a NumPy array, and how is it initialized?

2. How can you check the dimensions of a NumPy array?
3. What is the difference between the zeros and ones functions in NumPy?
4. How do you find the shape of a NumPy array?
5. How can you convert a Python list into a NumPy array?

Augmented Questions:

1. Write a NumPy program to get the indices of the sorted elements of a given array.

2. Write a NumPy program to sort the specified number of elements from beginning of a given array.

3. Count the number of elements in an array within a specific range.

4. Write a NumPy program that creates a NumPy array of random numbers and uses SciPy to compute
the statistical properties (mean, median, variance) of the array.

5. Write a NumPy program to count a given word in each row of a given array of string values.

45
3.a. Working with DataFrame
AIM:
To work with dataframe provided by pandas.

ALGORITHM:
1. Install the pandas package
2. Work with all the features of dataframe.

1. DataFrame
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a
table with rows and columns.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

2. Locate Row
As you can see from the result above, the DataFrame is like a table with rows and
columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:df
= pd.DataFrame(data) print(df.loc[0])

3. Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]

CS3362 Data Science Laboratory 46

}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)

4. Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).
Example Return
"day2":
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df.loc["day2"])

5. Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)

Output :

CS3362 Data Science Laboratory 47

3b. Pandas program to add, subtract,multiy and divide two pandas series.

Aim:

To Write a pandas program to add, subtract, multiply and divide two pandas series.

Program:
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9])
ds = ds1 + ds2
print("Add two Series:")
print(ds)
print("Subtract two Series:")
ds = ds1 - ds2
print(ds)
print("Multiply two Series:")
ds = ds1 * ds2
print(ds)
print("Divide Series1 by Series2:")
ds = ds1 / ds2
print(ds)

Output:

Add two Series:

0 3
1 7
2 11
3 15
4 19

CS3362 Data Science Laboratory 48

dtype: int64
Subtract two Series:
0 1
1 1
2 1
3 1
4 1
dtype: int64
Multiply two Series:
0 2
1 12
2 30
3 56
4 90
dtype: int64
Divide Series1 by Series2:
0 2.000000
1 1.333333
2 1.200000
3 1.142857
4 1.111111
dtype: float64

3b. Pandas program to convert a Numpy array to a Pandas series.

Aim:

To write a Pandas program to convert a Numpy array to a Pandas series.

Program:

import numpy as np
import pandas as pd
np_array = np.array([10, 20, 30, 40, 50])
print("NumPy array:")
print(np_array)
new_series = pd.Series(np_array)
print("Converted Pandas series:")
print(new_series)
Output:

numPy array:
[10 20 30 40 50]

49
Converted Pandas series:
0 10
1 20
2 30
3 40
4 50
dtype: int64

3c. Pandas Program to create a distance from a dictionary and display it

Aim:

To write a Pandas Program to create a distance from a dictionary and display it.
Program :
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
print(df)
Output:

X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83

Viva Questions

1. How do you create a DataFrame in pandas from a dictionary?

2. Explain the difference between .loc[] and .iloc[] for indexing and selecting data in
pandas.
3. How can you handle missing values in a DataFrame using pandas?
4. What methods can be used to group data in pandas, and how do they work?
5. How do you merge or join two DataFrames in pandas?

Augmented Questions

1. Write a Python code snippet to create a DataFrame from a dictionary with sample data.
2. Write a Pandas program to convert all the string values to upper, lower cases in a given
pandas series. Also find the length of the string values.
3. Write a Pandas program to find the index of a given substring of a DataFrame column.
4. Write code to group a DataFrame by a specific column and calculate the mean for each
group.
5. Provide a code example to merge two DataFrames on a common column and handle any
duplicate entries.

50
4. Reading data from iris data set and doing descriptive analytics on the Iris data set

AIM:
To read data from files and exploring various commands for doing descriptive
analytics on the Iris data set.

ALGORITHM:
1. Download “Iris.csv” file from GitHub.com
2. Load the “Iris.csv” into google colab.
3. Perform descriptive analysis on the Iris file.

Importing Iris.csv
 Login to google colab by using gmail.
 Login to google drive and create a folder with required name.
 Move the Iris file from system to google drive.
 Click on the “file” icon and click on “Mount Device”.
 Code will appeared on a typing area, execute the same code.
 It requires authentication verification, complete the authentication.
 After successful verification it shows the message “Mounted at /content/drive”
 Find the Iris.csv file and copy the path for future references.

About Iris Database

Iris Dataset is considered as the Hello World for data science. It contains five columns
namely – Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a
flowering plant, the researchers have measured various features of the different iris flowers
and recorded them digitally.
You can download the Iris.csv file from the above link. Now we will use the Pandas
library to load this CSV file, and we will convert it into the dataframe. read_csv() method is
used to read CSV files.

Example:
import pandas as pd
# Reading the CSV file
df = pd.read_csv("/content/drive/MyDrive/Data_Science/iris.csv")
# Printing top 5 rows
df.head()

Getting Information about the Dataset

We will use the shape parameter to get the shape of the dataset.
df.shape -> returns no of rows and columns
df.info() -> returns column data types.

51
Checking Missing Values
We will check if our data contains any missing values or not. Missing values can occur
when no information is provided for one or more items or for a whole unit. We will use the
isnull() method.

Example:
df.isnull().sum()

Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates()
method helps in removing duplicates from the data frame.

Example:
data = df.drop_duplicates(subset ="variety",)
data

Data Visualization Visualizing

the target column
Our target column will be the Species column because at the end we will need the
result according to the species only. Let’s see a countplot for species.
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df,)
plt.show()

Relation between variables

We will see the relationship between the sepal length and sepal width and also
between petal length and petal width.
Example 1: Comparing Sepal Length and Sepal Width
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(x='petal.length', y='petal.width',hue='variety', data=df, )#
Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()

Example 2: Comparing Petal Length and Petal Width

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(x='petal.length', y='petal.width', hue='variety', data=df, )

52
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()

Handling Correlation
Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type columns
in the dataframe it is ignored.

Example:
data.corr(method='pearson')

Output

49
RESULT
Iris.csv file was loaded into google colab and descriptive analytics was made on the
Iris data set successfully.

Viva Questions
1. How do you read data from a text file using Python, and which functions are used for this purpose?
2. Which libraries are commonly used for reading data from Excel files in Python, and how do you use them?
3. Describe the method to read data from a web URL in Python.
4. What descriptive statistics can you compute on the Iris dataset using pandas?
5. How do you handle missing values in a dataset when performing descriptive analytics?
Augmented Questions
1. Write a Python script to read data from a text file and display the first five rows.
2. Demonstrate how to read an Excel file and calculate the mean and standard deviation of each column in the
Iris dataset.
3. Show how to fetch data from a given URL and load it into a pandas DataFrame.
4. Using the Iris dataset, write code to compute and display the median, variance, and quartiles for each feature.
5. Illustrate the process of handling missing values in the Iris dataset by replacing them with the column mean and
then computing the descriptive statistics.

50
5(a). Perform Univariate analysis on the diabetes data set

AIM:
Use the diabetes data set from UCI and Pima Indians Diabetes data set for Univariate
analysis.

ALGORITHM:
1. Download diabetes data set from UCI and Pima Indians Diabetes data set.
2. Load the above data files into google colab.
3. Perform analysis like Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.

Univariate analysis
 The term univariate analysis refers to the analysis of one variable.
 There are three common ways to perform univariate analysis on one variable:
Summary statistics – Measures the center and spread of values.
1. Central tendency — mean, median, mode
2. Dispersion — variance, standard deviation, range, interquartile
range (IQR)
3. Skewness — symmetry of data along with mean value
4. Kurtosis — peakedness of data at mean value
5. Frequency table – Describes how often different values occur.

File Importing:
# Reading the UCI file
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Data_Science/UCI_diabetes.csv")
# Printing top 5 rowsdf.head()
# Reading the Pima file
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")
# Printing top 5 rowsdf.head()

1. Central Tendency
We can use the following syntax to calculate various summary statistics like Mean,
Median and Mode.

Mean:
It is average value of given numeric values
 Mean of UCI data
import pandas as pd

51
# Reading the UCI file
df = pd.read_csv("/content/drive/MyDrive/Data_Science/UCI_diabetes.csv")
# Mean of UCI data
df.mean(axis=0)
 Mean of Pima data
import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Mean of Pima data
df.mean(axis=0)

Median:
It is middle most value of given values
 Median of UCI data
import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/UCI_diabetes.csv") #
Median of UCI data
df.median(axis=0)

 Median of Pima data

import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Median of Pima data
df.median(axis=0)

Mode:
It is the most frequently occurring value of given numeric variables
 Mode of UCI data
import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/UCI_diabetes.csv") #
Median of UCI data
df.mode(axis=0)

 Mode of Pima data

import pandas as pd

52
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Mean of Pima data
df.mode(axis=0)

2. Dispersion
Variance
The range is the difference between the maximum and minimum values of a data set.
Example
import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
variance of the BMI column
df.loc[:,"BMI"].var()

Standard deviation
Standard deviation is a measure of how spread out the numbers are. A large standard
deviation indicates that the data is spread out, - a small standard deviation indicates that the
data is clustered closely around the mean.
Example
import pandas as pd
# Reading the UCI file
df =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Standard deviation of the BMI column
df.loc[:,"BMI"].std()

Range
Range is the simplest of the measurements but is very limited in its use, we calculate
the range by taking the largest value of the dataset and subtract the smallest value from it, in
other words, it is the difference of the maximum and minimum values of a dataset.
Example
df=pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.cs v")
print("Range is:",df.BloodPressure.max()-df.BloodPressure.min())

Interquartile range
The interquartile range, often denoted “IQR”, is a way to measure the spread of the
middle 50% of a dataset. It is calculated as the difference between the first quartile* (the 25th
percentile) and the third quartile (the 75th percentile) of a dataset.
Example
# Importing important libraries

CS3362 Data Science Laboratory 53

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('seaborn')
data =
pd.read_csv('/content/drive/MyDrive/Data_Science/Pima_diabetes.csv') #
Removing the outliers
def removeOutliers(data, col):
Q3 = np.quantile(data[col], 0.75)
Q1 = np.quantile(data[col], 0.25)
IQR = Q3 - Q1

print("IQR value for column %s is: %s" % (col, IQR))

global outlier_free_list
global filtered_data

lower_range = Q1 - 1.5 * IQR upper_range

= Q3 + 1.5 * IQR outlier_free_list = [x for
x in data[col] if (
(x > lower_range) & (x < upper_range))] filtered_data =
data.loc[data[col].isin(outlier_free_list)] for i in
data.columns:
if i == data.columns[0]:
removeOutliers(data, i)
else:
removeOutliers(filtered_data, i)

# Assigning filtered data back to our original variable

data = filtered_data
print("Shape of data after outlier removal is: ", data.shape)

3. Skewness
 Skewness essentially measures the symmetry of the distribution.
Example
# importing pandas as pd
import pandas as pd
# Creating the dataframedf
=
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv") #
skip the na values
# find skewness in each row
df.skew(axis = 0, skipna = True)

CS3362 Data Science Laborato ry 54

4. kurtosis
kurtosis determines the heaviness of the distribution tails.
Example
import pandas as pd
df =
pd.read_csv('/content/drive/MyDrive/Data_Science/Pima_diabetes.csv')
df['BloodPressure'].kurtosis()

5. Frequency
Frequency is a count of the number of occurrences a particular value occurs or appears
in our data. A frequency table displays a set of values along with the frequency with which
they appear. They allow us to better understand which data values are common and which are
uncommon.
Example
# import packages
import pandas as pd
import numpy as np#
reading csv file
data =
pd.read_csv('/content/drive/MyDrive/Data_Science/Pima_diabetes.csv') #
one way frequency table for the species column.
freq_table = pd.crosstab(data['Age'], 'BMI')#
frequency table in proportion of species
freq_table= freq_table/len(data)
freq_table

Output

CS3362 Data Science Laboratory 55

CS3362 Data Science Laboratory 56
RESULT
Thus the Univariate analysis on the Diabetes data of UCI and Pima was performed
successfully.

CS3362 Data Science Laboratory

5(b). Perform Bivariate analysis on the diabetes data set.

AIM:
To use the UCI and Pima Indians Diabetes data set for Bivariate analysis.

ALGORITHM:
1. Download diabetes data set from UCI and Pima Indians Diabetes data set.
2. Load the above data files into google colab.
3. Perform various methods of bivariate.

Bivariate analysis
The term bivariate analysis refers to the analysis of two variables. The purpose of
bivariate analysis is to understand the relationship between two variables
There are three common ways to perform bivariate analysis:
1. Scatterplots
2. Correlation Coefficients
3. Simple Linear Regression

1. Scatterplots
A scatterplot is a type of data display that shows the relationship between two
numerical variables
Example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# import packages
data =
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Diabetes Outcome
g1 = data.loc[data.Outcome==1,:]
# Pregnancies, Glucose and Diabetes relation
g1.plot.scatter('Pregnancies', 'Glucose');

2. Correlation Coefficients
The correlation coefficient is a statistical measure of the strength of the relationship
between the relative movements of two variables. The values range between -1.0 and 1.0.
Correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a
perfect positive correlation. A correlation of 0.0 shows no linear relationship between the
movement of the two variables.
Example
# Import those libraries
import pandas as pd
from scipy.stats import pearsonr

CS3362 Data Science Laboratory 58

# Import your data into Pythondf
=
pd.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.csv")#
Convert dataframe into series
list1 = df['BloodPressure']
list2 = df['SkinThickness']#
Apply the pearsonr()
corr, _ = pearsonr(list1, list2) print('Pearsons
correlation: %.3f' % corr)

3. Simple Linear Regression

Simple linear regression is a statistical method that we can use to find a relationship
between two variables and make predictions. The two variables used are typically denoted as
y and x. The independent variable, or the variable used to predict the dependent variable is
denoted as x. The dependent variable, or the outcome/output, is denoted as y.
A simple linear regression model will produce a line of best fit, or the regression line.
You may have heard about drawing the line of best fit through a scatter plot of data.
Example
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset =
pd.read_csv('/content/drive/MyDrive/Data_Science/Pima_diabetes.csv') X =
dataset.iloc[:, :-1].values #get a copy of dataset exclude last column y =
dataset.iloc[:, 1].values #get array of dataset in column 1st
# Splitting the dataset into the Training set and Test setfrom
sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,random_state=0)
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

CS3362 Data Science Laboratory 59

Output

RESULT
Thus the Bivariate analysis on the Diabetes data of UCI and Pima
RESULT: wasperformed successfully.
Thus the Bivariate analysis on the diabetes data set was executed successfully.

CS3362 Data Science Laboratory 60

5(c). Perform Multiple Regression Analysis on the diabetes data set

AIM:
To use UCI and Pima Indians Diabetes data set for Multiple Regression Analysis.

ALGORITHM:
1. Download diabetes data set from UCI and Pima Indians Diabetes data set.
2. Load the above data files into google colab.
3. Perform multiple regression analysis on data sets.

Multiple Regression Analysis

Multiple regression is like linear regression, but with more than one independent
value, meaning that we try to predict a value based on two or more variables.
Example
# Pima_diabetes
import pandas
from sklearn import linear_modeldf
=
pandas.read_csv("/content/drive/MyDrive/Data_Science/Pima_diabetes.cs v")
X = df['Pregnancies ', 'Glucose ']y
= df['BloodPressure ']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the Blood Pressure based on Pregnancies and Glucose level:
predictedBP = regr.predict([[4, 120]])
print(predictedBP)

# UCI-Diabetes
import pandas
from sklearn import linear_modeldf
=
pandas.read_csv("("/content/drive/MyDrive/Data_Science/UCI_diabetes. csv")
X = df[['Time', 'Code']]y
= df['Value']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the Diabetes based on Time and Code:
predictedBP = regr.predict([[13:23, 46]])
print(predictedBP)

CS3362 Data Science Laboratory 61

Output

RESULT
Thus the Multiple Regression analysis on the Diabetes data of UCI and Pima was
performed successfully.

CS3362 Data Science Laboratory 62

6(a). Apply and explore Normal curves & Histograms plotting functions on UCI-Iris
data sets

AIM:
To apply and explore Normal curves & Histograms plotting functions on UCI-Iris
data sets.

ALGORITHM:
1. Download Iris data set from UCI.
2. Load the above Iris data files into google colab.
3. Plot the normal curve and Histograms for Iris data set.

Normal Curves
It is a probability function used in statistics that tells about how the data values are
distributed. It is the most important probability distribution function used in statistics because
of its advantages in real case scenarios.
Example
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
# import dataset
df = pd.read_csv("/content/drive/MyDrive/Data_Science/iris.csv")#
Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviation
mean = df["sepal.length"].mean()
sd = df.loc[:,"sepal.width"].std()
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()

Histograms plotting functions

A histogram is basically used to represent data provided in a form of some groups.It is
accurate method for the graphical representation of numerical data distribution.It is a type of
bar plot where X-axis represents the bin ranges while Y-axis gives information about
frequency.
Example
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.read_csv('/content/drive/MyDrive/Data_Science/iris.csv ')
data = df[' sepal.length']
bins = np.arange(min(data), max(data) + 1, 1)
plt.hist(data, bins = bins, density = True)

CS3362 Data Science Laboratory 63

plt.ylabel('sepal.width')
plt.xlabel( petal.length')
plt.show()

Output

RESULT
RESULT
Thus the UCI
Thusdata set wascurves
the Normal plotted&using Normal
Histograms Curvefunctions
plotting and Histogram plotting was
on UCI-Iris
executed data
successfully.
sets was performed successfully.

CS3362 Data Science Laboratory 64

6(b). Density and contour plotting functions on UCI-Iris data sets.

AIM:
To apply and explore Density & Contour plotting functions on UCI-Iris data sets.

ALGORITHM:
1. Download Iris data set from UCI.
2. Load the above Iris data files into google colab.
3. Plot the density and contour plotting for Iris data sets.

Density Plotting
Density Plot is a type of data visualization tool. It is a variation of the histogram that
uses ‘kernel smoothing’ while plotting the values. It is a continuous and smooth version of a
histogram inferred from a data.
Density plots uses Kernel Density Estimation (so they are also known as Kernel density
estimation plots or KDE) which is a probability density function. The region of plot with a
higher peak is the region with maximum data points residing between those values.

Example - Density plot of several variables

# libraries & dataset
import seaborn as sns
import matplotlib.pyplot as plt
# set a grey background (use sns.set_theme() if seaborn version 0.11.0 or
above)
sns.set(style="darkgrid") df
= sns.load_dataset('iris')
# plotting both distibutions on the same figure
fig = sns.kdeplot(df['sepal_width'], shade=True, color="r") fig =
sns.kdeplot(df['sepal_length'], shade=True, color="b")plt.show()

Contour plotting
Contour plots also called level plots are a tool for doing multivariate analysis and
visualizing 3-D plots in 2-D space. If we consider X and Y as our variables we want to plot
then the response Z will be plotted as slices on the X-Y plane due to which contours are
sometimes referred as Z-slices or iso-response.
Contour plots are widely used to visualize density, altitudes or heights of the
mountain as well as in the meteorological department.

Example
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl

CS3362 Data Science Laboratory 66

px_orbital = pd.read_csv("/content/drive/MyDrive/Data_Science/iris.csv")
x = px_orbital.iloc[0, 1:]
y = px_orbital.iloc[1:, 0]
px_values = px_orbital.iloc[1:, 1:]
mpl.rcParams['font.size'] = 14
mpl.rcParams['legend.fontsize'] = 'large'
mpl.rcParams['figure.titlesize'] = 'medium'fig,
ax = plt.subplots()
ticks = np.linspace(pmin, pmax, 6)
CS = ax.contourf(x, y, px_values, cmap="RdBu", levels=levels)
ax.set_aspect('equal')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(CS, format="%.3f", ticks=ticks)

Output

RESULT
Thus the UCI data set was plotted using Density & Contour plotting was executed
successfully.

Viva Questions

1. What is univariate analysis, and why is it important in data analysis?

2. How does linear regression differ from logistic regression?
3. What is the purpose of multiple regression analysis?
4. How can you interpret the skewness and kurtosis of a dataset?
5. What are the key differences between the UCI diabetes dataset and the Pima Indians Diabetes dataset?

CS3362 Data Science Laboratory 67

Augmented Experiments :

1. Load the diabetes dataset from UCI and the Pima Indians Diabetes dataset. Perform univariate analysis to
calculate frequency, mean, median, mode, variance, standard deviation, skewness, and kurtosis for each
dataset. Document the steps and results with example code and summary statistics.
2. Using the UCI diabetes dataset, perform linear regression to model the relationship between a chosen
independent variable and the target variable. Repeat the process for the Pima Indians Diabetes dataset.
Document the steps and compare the results.
3. Perform logistic regression on both the UCI diabetes dataset and the Pima Indians Diabetes dataset to
predict the presence of diabetes based on selected features. Document the process and compare the
results.
4. Conduct multiple regression analysis on the UCI diabetes dataset to predict the target variable using
multiple independent variables. Repeat the process for the Pima Indians Diabetes dataset. Document the
steps and compare the results.
5. Compare the univariate, bivariate (linear and logistic regression), and multiple regression analysis results
for the UCI diabetes dataset and the Pima Indians Diabetes dataset. Discuss any similarities and
differences observed in the analysis. Document the comparison with example code and results.

68
6(c). Correlation and scatter plotting functions on UCI data sets.

AIM:
To apply and correlation & Scatter plotting functions on UCI-Iris data sets.

ALGORITHM:
1. Download Iris data set from UCI.
2. Load the above Iris data files into google colab.
3. Plot the correlation and scatter plotting for Iris data sets.

Correlation Matrix Plotting

Correlation gives an indication of how related the changes are between two variables.
If two variables change in the same direction they are positively correlated. If the change in
opposite directions together (one goes up, one goes down), then they are negatively
correlated.
You can calculate the correlation between each pair of attributes. This is called a
correlation matrix. You can then plot the correlation matrix and get an idea of which
variables have a high correlation with each other.
This is useful to know, because some machine learning algorithms like linear and
logistic regression can have poor performance if there are highly correlated input variables in
your data.

Example
# Correction Matrix Plot import
matplotlib.pyplot as pltimport
pandas
import numpy
url =
"https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-
indians-diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']data =
pandas.read_csv(url, names=names)
correlations = data.corr() #
plot correlation matrix fig
= plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(correlations, vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = numpy.arange(0,9,1)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_xticklabels(names)
ax.set_yticklabels(names)
plt.show()

CS3362 Data Science Laboratory 69

Scatter Plotting
A scatterplot shows the relationship between two variables as dots in two dimensions,
one axis for each attribute. You can create a scatterplot for each pair of attributes in your data.
Drawing all these scatterplots together is called a scatterplot matrix.
Scatter plots are useful for spotting structured relationships between variables, like
whether you could summarize the relationship between two variables with a line. Attributes
with structured relationships may also be correlated and good candidates for removal from your
dataset.

Example
# Scatterplot Matrix
import matplotlib.pyplot as plt
import pandas
from pandas.plotting import scatter_matrix
url =
"https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-
indians-diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']data =
pandas.read_csv(url, names=names)
scatter_matrix(data)
plt.show()

Output

CS3362 Data Science Laboratory 70

RESULT
Thus the UCI data set was plotted using Correlation and scatter plotting was executed
successfully.

Viva Questions :

1. What is the purpose of using normal curves in data visualization?

2. How can density and contour plots help in understanding data distributions?
3. What insights can be gained from correlation and scatter plots?
4. How do histograms help in visualizing data distributions?
5. What are the advantages of three-dimensional plotting in data visualization?

Augmented Experiments :

1. Load a dataset from the UCI repository and plot a normal curve for one of the continuous variables.
Document the steps and results with example code and plots.
2. Use the same dataset to create a density plot and a contour plot for two continuous variables. Document
the process and provide the resulting plots.
3. Create a scatter plot to visualize the correlation between two variables from the dataset. Calculate and
display the correlation coefficient. Document the process with example code and plots.
4. Generate histograms for several variables in the dataset to explore their distributions. Document the steps
and provide the histograms.
5. Create a three-dimensional plot for three continuous variables from the dataset. Document the process
with example code and the resulting plot.

CS3362 Data Science Laboratory 70

7. Visualizing Geographic Data with Basemap

AIM:
To visualizing the Geographic Data with Basemap using Zomato geographic data.

ALGORITHM:
1. Study the basics of Basemap.
2. Use Zomato data to plot city names and restaurants details.

Basemap Introduction
Basemap is a toolkit under the Python visualization library Matplotlib. Its main function
is to draw 2D maps, which are important for visualizing spatial data. basemap itself does not
do any plotting, but provides the ability to transform coordinates into one of 25 different map
projections.

Zomato data Visualization

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from glob import glob as gb

#list all the directories

dirs=os.listdir("C:/Users/IT LAB-I\Desktop/Data_Science/zomato_data/")dirs

len(dirs)

#storing all the files from every directory

li=[]
for dir1 in dirs:
files=os.listdir(r"C:/Users/IT LAB-I\Desktop/Data_Science/zomato_data/"+dir1) #reading
each file from list of files from previous step and creating pandas data famefor file in files:

df_file=pd.read_csv("C:/Users/IT LAB-
I\Desktop/Data_Science/zomato_data/"+dir1+"/"+file,quotechar='"',delimiter="|") #appending
the dataframe into a list
li.append(df_file.values)

len(li)

71
#numpys vstack method to append all the datafames to stack the sequence of inputvertically
to make a single array
df_np=np.vstack(li)

#no of rows is represents the total no restaurants ,now of coloumns(12) is columns forthe
dataframe
df_np.shape

#creating final dataframe from the numpy array

df_final=pd.DataFrame(df_np)

#adding the header columns

df_final=pd.DataFrame(df_final.values, columns
=["NAME","PRICE","CUSINE_CATEGORY","CITY","REGION","URL","PAGE
NO","CUSINE TYPE","TIMING","RATING_TYPE","RATING","VOTES"])

#displaying the dataframe

df_final

#header column "PAGE NO" is not required ,i used it while scraping the data fromzomato to
do some sort of validation,lets remove the column df_final.drop(columns=["PAGE
NO"],axis=1,inplace=True)

#display the dataframe again

df_final

#lets count how many unique cities are there

df_final["CITY"].unique()

# import json and requests library to use googl apis to get the longitude ant latituidevalues
import requests
import json
#creating a separate array with all city names as elements of array
city_name=df_final["CITY"].unique()
li1=[]
#googlemap api calling url
geo_s ='https://maps.googleapis.com/maps/api/geocode/json'
#iterating through a for loop for each city names
for i in range(len(city_name)):
#i have used my own google map api, please use ypur own api
param = {'address': city_name[i], 'key': 'AIzaSyD-kYTK-
8FQGueJqA2028t2YHbUX96V0vk'}
response = requests.get(geo_s, params=param)

CS3362 Data Science Laboratory 72

response=response.text
data=json.loads(response)
#setting up the variable with corresponding city longitude and latitude
lat=data["results"][0]["geometry"]["location"]["lat"]
lng=data["results"][0]["geometry"]["location"]["lng"]
#creating a new data frame with city , latitude and longitude as columns
df2=pd.DataFrame([[city_name[i],lat,lng]])
li1.append(df2.values)

#numpys vstack method to append all the datafames to stack the sequence of input
vertically to make a single array
df_np=np.vstack(li1)

#creating a second dataframe with city name, latitude and longitude

df_sec=pd.DataFrame(df_np,columns=["CITY","lat","lng"])

#display the second dataframe contents

df_sec

#merge this data frame to the existing df_final data frame using merge and join featuresfrom
pandas,and creating a new data frame
df_final2=df_final.merge(df_sec,on="CITY",how="left")

#display the contents , it will have longitude and latitude now

df_final2

#creating pandas series to hold the citynames and corresponding count of restuarnats in
ascending order
li2=df_final["CITY"].value_counts().sort_values(ascending=True)li2

#creating a empty dictionary

dc={}
#setting dictionary values as city name , count of restuarnat and key will city names aswell
for i,j in li2.items():
x=i + "," +str(j)
dc.update({i:[i,j]})

#displaying the dictionary

CS3362 Data Science Laboratory 73

#creating another data frame from the above dictionary
df_map=pd.DataFrame.from_dict(dc,orient="index",columns=["CITY","COUNT"])

#displaying the data frame

df_map

#merging this data frame with df_sec data frame(which we created using citynames,longitude
and latitude) df_map_final=df_map.merge(df_sec,on="CITY",how="left")

#displaying the new data frame this frame will be used for map ploting
df_map_final

#importing the libraries for map ploting

from matplotlib import cm
from matplotlib.dates import date2num from
mpl_toolkits.basemap import Basemap# for
date and time processing
import datetime

#lets take one data frame for top 20 cities with most retaurants countsdf_plot_top=df_map_final.tail(20)

#displaying the data frame

df_plot_top

#lets plot this inside the map corresponding to the cities exact co-ordinates which we
received from google api
#plt.subplots(figsize=(20,50))
plt.figure(figsize=(50,60))
map=Basemap(width=120000,height=900000,projection="lcc",resolution="l",llcrnrlon
=67,llcrnrlat=5,urcrnrlon=99,urcrnrlat=37,lat_0=28,lon_0=77)
map.drawcountries()
map.drawmapboundary(color='#f2f2f2')
map.drawcoastlines()
lg=np.array(df_plot_top["lng"])
lat=np.array(df_plot_top["lat"])
pt=np.array(df_plot_top["COUNT"])
city_name=np.array(df_plot_top["CITY"])
x,y=map(lg,lat)
#using lambda function to create different sizes of marker as per thecount
p_s=df_plot_top["COUNT"].apply(lambda x: int(x)/2)
#plt.scatter takes logitude ,latitude, marker size,shape,and color as parameter in thebelow , in
this plot marker color is always blue.

74
plt.scatter(x,y,s=p_s,marker="o",c='BLUE')
plt.title("TOP 20 INDIAN CITIES RESTAURANT COUNTS PLOT AS PER
ZOMATO",fontsize=30,color='RED')

#lets plot this inside the map corresponding to the cities exact co-ordinates which we
received from google api ,here marker color will be different as per marker size
#plt.subplots(figsize=(20,50))
plt.figure(figsize=(50,60))
map=Basemap(width=120000,height=900000,projection="lcc",resolution="l",llcrnrlon
=67,llcrnrlat=5,urcrnrlon=99,urcrnrlat=37,lat_0=28,lon_0=77)
map.drawcountries()
map.drawmapboundary(color='#f2f2f2')
map.drawcoastlines()
lg=np.array(df_plot_top["lng"])
lat=np.array(df_plot_top["lat"])
pt=np.array(df_plot_top["COUNT"])
city_name=np.array(df_plot_top["CITY"])
x,y=map(lg,lat)
#using lambda function to create different sizes of marker as per thecount
p_s=df_plot_top["COUNT"].apply(lambda x: int(x)/2)
#plt.scatter takes logitude ,latitude, marker size,shape,and color as parameter in thebelow
, in this plot marker color is different. plt.scatter(x,y,s=p_s,marker="o",c=p_s)
plt.title("TOP 20 INDIAN CITIES RESTAURANT COUNTS PLOT AS PER
ZOMATO",fontsize=30,color='RED')

#lets plot with the city names inside the map corresponding to the cities exact co- ordinates
which we received from google api ,here marker color will be different as permarker size
#plt.subplots(figsize=(20,50))
plt.figure(figsize=(50,60))
map=Basemap(width=120000,height=900000,projection="lcc",resolution="l",llcrnrlon
=67,llcrnrlat=5,urcrnrlon=99,urcrnrlat=37,lat_0=28,lon_0=77)
map.drawcountries()
map.drawmapboundary(color='#f2f2f2')
map.drawcoastlines()
lg=np.array(df_plot_top["lng"])
lat=np.array(df_plot_top["lat"])
pt=np.array(df_plot_top["COUNT"])
city_name=np.array(df_plot_top["CITY"])
x,y=map(lg,lat)
#using lambda function to create different sizes of marker as per thecount
p_s=df_plot_top["COUNT"].apply(lambda x: int(x)/2)

CS3362 Data Science Laboratory 75

#plt.scatter takes logitude ,latitude, marker size,shape,and color as parameter in thebelow
, in this plot marker color is different. plt.scatter(x,y,s=p_s,marker="o",c=p_s)
for a,b ,c,d in zip(x,y,city_name,pt):
#plt.text takes x position , y position ,text ,font size and color as arguments
plt.text(a,b,c,fontsize=30,color="r")
plt.title("TOP 20 INDIAN CITIES RESTAURANT COUNTS PLOT AS PER
ZOMATO",fontsize=30,color='RED')

#lets plot with the city names and restaurants count inside the map corresponding to thecities
exact co-ordinates which we received from google api ,here marker color will be different as
per marker size
#plt.subplots(figsize=(20,50))
plt.figure(figsize=(50,60))
map=Basemap(width=120000,height=900000,projection="lcc",resolution="l",llcrnrlon
=67,llcrnrlat=5,urcrnrlon=99,urcrnrlat=37,lat_0=28,lon_0=77)
map.drawcountries()
map.drawmapboundary(color='#f2f2f2')
map.drawcoastlines()
lg=np.array(df_plot_top["lng"])
lat=np.array(df_plot_top["lat"])
pt=np.array(df_plot_top["COUNT"])
city_name=np.array(df_plot_top["CITY"])
x,y=map(lg,lat)
#using lambda function to create different sizes of marker as per thecount
p_s=df_plot_top["COUNT"].apply(lambda x: int(x)/2)
#plt.scatter takes logitude ,latitude, marker size,shape,and color as parameter in thebelow
, in this plot marker color is different. plt.scatter(x,y,s=p_s,marker="o",c=p_s)
for a,b ,c,d in zip(x,y,city_name,pt):
#plt.text takes x position , y position ,text(city name) ,font size and color as arguments
plt.text(a,b,c,fontsize=30,color="r")
#plt.text takes x position , y position ,text(restaurant counts) ,font size and color as
arguments, like above . but only i have changed the x and y position to make it moreclean
and easier to read
plt.text(a+60000,b+30000,d,fontsize=30)
plt.title("TOP 20 INDIAN CITIES RESTAURANT COUNTS PLOT AS PER
ZOMATO",fontsize=30,color='RED')

76
Output

CS3362 Data Science Laboratory 77

RESULT

Thus the visualization of Zomato geographic data was visualized using Basemap.

Augmented questions :

1. Write a Python program using Basemap to create an interactive map that displays the
locations of major cities around the world. Include functionality to zoom in and out, and add
labels to each city. How would you integrate Basemap with other libraries to enhance
interactivity?

CS3362 Data Science Laboratory 78

2. Develop a Python script to visualize global climate data using Basemap. Create a map that
displays temperature anomalies with a color gradient. Integrate Basemap with data from a
CSV file containing latitude, longitude, and temperature anomaly values. How would you
handle large datasets and ensure efficient rendering of the map?

Viva questions:

1. What is the Basemap toolkit, and how is it used for visualizing geographic data in Python?
2. How would you plot a simple map of a specific region or country using Basemap? What are
the basic steps involved in creating such a map?
3. Explain how to overlay markers or data points on a Basemap. What functions or methods are
used to add these elements to the map?
4. How can you display geographic data such as country borders, rivers, or cities on a map using
Basemap? What are some common map features you can add?
5. Describe how to customize the appearance of a map created with Basemap, such as changing
the map's projection, adding gridlines, or adjusting the map's color scheme.

CS3362 Data Science Laboratory 79

MATH1005 Final Exam 2022
No ratings yet
MATH1005 Final Exam 2022
17 pages
hw3 P
50% (2)
hw3 P
15 pages
Mca Data Structure Journal Sem II
No ratings yet
Mca Data Structure Journal Sem II
27 pages
Laboratory Manual: Silver Oak College of Engineering and Technology
No ratings yet
Laboratory Manual: Silver Oak College of Engineering and Technology
27 pages
Fest Management System Abstract
56% (9)
Fest Management System Abstract
2 pages
Resume Builder
No ratings yet
Resume Builder
12 pages
Data Structures & System Programming Lab File
0% (1)
Data Structures & System Programming Lab File
29 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Shell Prog Finalpdf
No ratings yet
Shell Prog Finalpdf
43 pages
MCA Sem Java Program Solution
No ratings yet
MCA Sem Java Program Solution
16 pages
Mcs 023
No ratings yet
Mcs 023
261 pages
Telecom Billing System: Pragallath P.S
No ratings yet
Telecom Billing System: Pragallath P.S
16 pages
Anna University Questions Department of CSE III Year CS1005 - Advanced Java Programming (Elective) Unit I 2 Marks
No ratings yet
Anna University Questions Department of CSE III Year CS1005 - Advanced Java Programming (Elective) Unit I 2 Marks
5 pages
(Csc. 351 Software Engineering) : Lecturer: Hiranya Bastakoti
No ratings yet
(Csc. 351 Software Engineering) : Lecturer: Hiranya Bastakoti
13 pages
Sarvani Madiraju Resume
No ratings yet
Sarvani Madiraju Resume
1 page
Ds&Rp (All Units Notes)
No ratings yet
Ds&Rp (All Units Notes)
114 pages
Pps Practical File
100% (1)
Pps Practical File
61 pages
Oops
No ratings yet
Oops
28 pages
B. Sc. CSIT Final Year Project Work:: Structuring Report, Presentation and Evaluation
No ratings yet
B. Sc. CSIT Final Year Project Work:: Structuring Report, Presentation and Evaluation
41 pages
MCA Mini Project Report Format 12-2023
No ratings yet
MCA Mini Project Report Format 12-2023
8 pages
Web Programming Unit - I
80% (10)
Web Programming Unit - I
45 pages
Aim Algorithm Result
100% (1)
Aim Algorithm Result
10 pages
CAHM Unit 1 Notes
No ratings yet
CAHM Unit 1 Notes
16 pages
Coa Lab Manual
No ratings yet
Coa Lab Manual
21 pages
Parking Management Solution
No ratings yet
Parking Management Solution
11 pages
MCA-Sample Resume
No ratings yet
MCA-Sample Resume
2 pages
Adv. Java Lab File
No ratings yet
Adv. Java Lab File
94 pages
Java Input/Output - Text and Binary Streams: Introduction To Data Streams
No ratings yet
Java Input/Output - Text and Binary Streams: Introduction To Data Streams
10 pages
Computer Graphics and Animation Lab: BCAP 286
No ratings yet
Computer Graphics and Animation Lab: BCAP 286
6 pages
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
No ratings yet
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
39 pages
Web Lab Report 2
No ratings yet
Web Lab Report 2
6 pages
Updated CN COURSE FILE R20 CSE-B, C
No ratings yet
Updated CN COURSE FILE R20 CSE-B, C
96 pages
21cs53 Super-Imp-Tie-23
No ratings yet
21cs53 Super-Imp-Tie-23
2 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Syllabus of MCA - Bridge Course (Mangt) (2020patt.) 2020 - 2022 - 13.05.2021
No ratings yet
Syllabus of MCA - Bridge Course (Mangt) (2020patt.) 2020 - 2022 - 13.05.2021
13 pages
B. SC Computer Science
100% (1)
B. SC Computer Science
5 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
B.tech 20-21 Internship
No ratings yet
B.tech 20-21 Internship
9 pages
Database Management System: Course Code: CSL-402
No ratings yet
Database Management System: Course Code: CSL-402
49 pages
Module 5
No ratings yet
Module 5
16 pages
College Event Management System
No ratings yet
College Event Management System
53 pages
Grid Computing All Question
No ratings yet
Grid Computing All Question
5 pages
Cse4001 Parallel-And-Distributed-Computing Eth 1.1 47 Cse4001
50% (2)
Cse4001 Parallel-And-Distributed-Computing Eth 1.1 47 Cse4001
2 pages
Python Programming - Syllabus
No ratings yet
Python Programming - Syllabus
1 page
BCA-244 C++ Programming Laboratory
No ratings yet
BCA-244 C++ Programming Laboratory
51 pages
Limitations of Algorithm Power
100% (1)
Limitations of Algorithm Power
10 pages
Software Construction Lecture 1
No ratings yet
Software Construction Lecture 1
30 pages
Te Aids - (Elective-I) Human Computer Interface
No ratings yet
Te Aids - (Elective-I) Human Computer Interface
2 pages
Java Practical
No ratings yet
Java Practical
57 pages
5th & 6th Sem Question Paper of BSC Computer Science (ASSAM UNIVERSITY)
No ratings yet
5th & 6th Sem Question Paper of BSC Computer Science (ASSAM UNIVERSITY)
16 pages
QuestionBank LabPractcals
No ratings yet
QuestionBank LabPractcals
12 pages
Major Synopsis IPU PDF
No ratings yet
Major Synopsis IPU PDF
17 pages
Java Learning Roadmap For Beginners - Kabirosky
No ratings yet
Java Learning Roadmap For Beginners - Kabirosky
6 pages
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
No ratings yet
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
25 pages
Core Java Program List 2023 PDF
No ratings yet
Core Java Program List 2023 PDF
1 page
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Data science lab
No ratings yet
Data science lab
61 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Forestry Drone
No ratings yet
Forestry Drone
6 pages
Introduction to Operating Systems - CS3451 - Notes - Unit 1 - Introduction
No ratings yet
Introduction to Operating Systems - CS3451 - Notes - Unit 1 - Introduction
36 pages
Ml _250508_092701
No ratings yet
Ml _250508_092701
2 pages
AL3451-MACHINE LEARNING
No ratings yet
AL3451-MACHINE LEARNING
87 pages
Ml _250507_164740
No ratings yet
Ml _250507_164740
2 pages
DS Unit 2
No ratings yet
DS Unit 2
33 pages
DS Unit 5
No ratings yet
DS Unit 5
17 pages
Push Button Hands-On Document
No ratings yet
Push Button Hands-On Document
5 pages
Augumented Reality
No ratings yet
Augumented Reality
5 pages
Hafiz Khawar
No ratings yet
Hafiz Khawar
11 pages
TechQuiz MediumQAs
No ratings yet
TechQuiz MediumQAs
3 pages
TechQuiz HardQAs
No ratings yet
TechQuiz HardQAs
3 pages
Link State Routing Protocols
No ratings yet
Link State Routing Protocols
49 pages
Advanced Data Structures and Algorithms Roadmap PDF by ScholarHat
No ratings yet
Advanced Data Structures and Algorithms Roadmap PDF by ScholarHat
33 pages
Template-FYP Project Report - 13122018
No ratings yet
Template-FYP Project Report - 13122018
24 pages
4th Sem Syllabus
No ratings yet
4th Sem Syllabus
10 pages
Ijkstra S Algorithm: by Laksman Veeravagu and Luis Barrera
No ratings yet
Ijkstra S Algorithm: by Laksman Veeravagu and Luis Barrera
23 pages
Single Source Shortest Paths
No ratings yet
Single Source Shortest Paths
17 pages
Dsa MCQ
No ratings yet
Dsa MCQ
34 pages
Artificial Intelligence and Machine Learning - Final
No ratings yet
Artificial Intelligence and Machine Learning - Final
243 pages
Dijkstra's Algorithm: 1 N Ij I J 1
No ratings yet
Dijkstra's Algorithm: 1 N Ij I J 1
5 pages
Download Full Artificial Intelligence For Games 2nd Edition Ian Millington PDF All Chapters
100% (10)
Download Full Artificial Intelligence For Games 2nd Edition Ian Millington PDF All Chapters
82 pages
Studying Solutions of The P-Median Problem For The Location of Public Bike Stations
No ratings yet
Studying Solutions of The P-Median Problem For The Location of Public Bike Stations
11 pages
Part 2 - Graph Algorithms and Data Structures
No ratings yet
Part 2 - Graph Algorithms and Data Structures
28 pages
Computer Networks Lab Syllabus III Year B.Tech. CSE - I Sem
No ratings yet
Computer Networks Lab Syllabus III Year B.Tech. CSE - I Sem
87 pages
Dijkstra Shortest Path Algorithm Using Global Positioning System
No ratings yet
Dijkstra Shortest Path Algorithm Using Global Positioning System
7 pages
Major Project Final Report 3
No ratings yet
Major Project Final Report 3
57 pages
Implementation Dijkstra's Path Finding: Problem
No ratings yet
Implementation Dijkstra's Path Finding: Problem
4 pages
L9 Greedy Algorithms Introduction
No ratings yet
L9 Greedy Algorithms Introduction
19 pages
cz3005 Asg1 Report
No ratings yet
cz3005 Asg1 Report
6 pages
Buy ebook (Ebook) AI for Games, 3e by Millington, Ian ISBN 9781138483972, 9781351053297, 1138483974, 1351053299 cheap price
100% (9)
Buy ebook (Ebook) AI for Games, 3e by Millington, Ian ISBN 9781138483972, 9781351053297, 1138483974, 1351053299 cheap price
65 pages
Daa Lab Manual Cse III I Sem
No ratings yet
Daa Lab Manual Cse III I Sem
47 pages
A Genetic Algorithm For The Optimization of Cable Routing
No ratings yet
A Genetic Algorithm For The Optimization of Cable Routing
11 pages
Problem Set 1
No ratings yet
Problem Set 1
13 pages
Assignment 2
No ratings yet
Assignment 2
33 pages
International Journal of Transportation Science and Technology
No ratings yet
International Journal of Transportation Science and Technology
14 pages
CS 312 Project 3: Intelligent Scissors
No ratings yet
CS 312 Project 3: Intelligent Scissors
6 pages
6173e574-d438-45f0-b533-c45f77b7fff8_Untitled
No ratings yet
6173e574-d438-45f0-b533-c45f77b7fff8_Untitled
72 pages
Dijkstras Algorithm
No ratings yet
Dijkstras Algorithm
39 pages
Problems and Solution_2023-2024-19-9-2024
No ratings yet
Problems and Solution_2023-2024-19-9-2024
15 pages
Unit IV: Graphs (Refer T-1 and R-6)
No ratings yet
Unit IV: Graphs (Refer T-1 and R-6)
27 pages