0% found this document useful (0 votes)
994 views12 pages

Lab - Manual FDS

The document describes the contents and objectives of a data science laboratory course. It includes: 1. The course objectives are to understand Python libraries for data science like NumPy, Pandas, and Matplotlib, and to learn statistical analysis, data visualization, and machine learning techniques. 2. The list of experiments cover downloading and using data science packages, working with NumPy arrays, reading and exploring data, and applying techniques like regression, correlation, and plotting. 3. Students will learn to use Python libraries for data science, apply statistical measures, perform descriptive analytics, and present data visually.

Uploaded by

Prabha K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
994 views12 pages

Lab - Manual FDS

The document describes the contents and objectives of a data science laboratory course. It includes: 1. The course objectives are to understand Python libraries for data science like NumPy, Pandas, and Matplotlib, and to learn statistical analysis, data visualization, and machine learning techniques. 2. The list of experiments cover downloading and using data science packages, working with NumPy arrays, reading and exploring data, and applying techniques like regression, correlation, and plotting. 3. Students will learn to use Python libraries for data science, apply statistical measures, perform descriptive analytics, and present data visually.

Uploaded by

Prabha K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DATA SCIENCE LABORATORY

DATA SCIENCE
LAB MANUAL
(Under Revision)

Prepared & Consolidated


by
Vignesh L S

VIGNESH LS 1
DATA SCIENCE LABORATORY

CS3362 DATA SCIENCE LABORATORY (Under Revision)


COURSE OBJECTIVES:
 To understand the python libraries for data science
 To understand the basic Statistical and Probability measures for data science.
 To learn descriptive analytics on the benchmark data sets.
 To apply correlation and regression analytics on standard data sets.
 To present and interpret data using visualization packages in Python.
LIST OF EXPERIMENTS:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
 Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
 Bivariate analysis: Linear and logistic regression modeling
 Multiple Regression analysis
 Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
 Normal curves
 Density and contour plots
 Correlation and scatter plots
 Histograms
 Three dimensional plotting
7. Visualizing Geographic Data with Basemap

LIST OF EQUIPMENTS :(30 Students per Batch)


Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

VIGNESH LS 2
DATA SCIENCE LABORATORY

COURSE OUTCOMES:
At the end of this course, the students will be able to:
 CO1: Make use of the python libraries for data science
 CO2: Make use of the basic Statistical and Probability measures for data science.
 CO3: Perform descriptive analytics on the benchmark data sets.
 CO4: Perform correlation and regression analytics on standard data sets
 CO5: Present and interpret data using visualization packages in Python.

VIGNESH LS 3
DATA SCIENCE LABORATORY

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages.
Aim
To Download and install python and its packages using pip installation
Procedure
Install Python Data Science Packages
Python is a high-level and general-purpose programming language with data science and
machine learning packages. Use the video below to install on Windows, MacOS, or Linux. As
a first step, install Python for Windows, MacOS, or Linux.
Python Packages
The power of Python is in the packages that are available either through the pip or conda
package managers. This page is an overview of some of the best packages for machine
learning and data science and how to install them.
We will explore the Python packages that are commonly used for data science and machine
learning. You may need to install the packages from the terminal, Anaconda prompt,
command prompt, or from the Jupyter Notebook. If you have multiple versions of Python or
have specific dependencies then use an environment manager such as pyenv. For most users,
a single installation is typically sufficient. The Python package manager pip has all of the
packages (such as gekko) that we need for this course. If there is an administrative access
error, install to the local profile with the --user flag.

pip install gekko

Gekko
Gekko provides an interface to gradient-based solvers for machine learning and
optimization of mixed-integer, differential algebraic equations, and time series models.
Gekko provides exact first and second derivatives through automatic differentiation and
discretization with simultaneous or sequential methods.

pip install gekko

Keras
Keras provides an interface for artificial neural networks. Keras acts as an interface for the
TensorFlow library. Other backend packages were supported until version 2.4. TensorFlow
is now the only backend and is installed separately with pip install tensorflow.

pip install keras

VIGNESH LS 4
DATA SCIENCE LABORATORY

Matplotlib
The package matplotlib generates plots in Python.

pip install matplotlib

Numpy
Numpy is a numerical computing package for mathematics, science, and engineering. Many
data science packages use Numpy as a dependency.

pip install numpy

OpenCV
OpenCV (Open Source Computer Vision Library) is a package for real-time computer vision
and developed with support from Intel Research.

pip install opencv-python

Pandas
Pandas visualizes and manipulates data tables. There are many functions that allow efficient
manipulation for the preliminary steps of data analysis problems.

pip install pandas

Plotly
Plotly renders interactive plots with HTML and JavaScript. Plotly Express is included with
Plotly.

pip install plotly

PyTorch
PyTorch enables deep learning, computer vision, and natural language processing.
Development is led by Facebook's AI Research lab (FAIR).

pip install torch

Scikit-Learn
Scikit-Learn (or sklearn) includes a wide variety of classification, regression and clustering
algorithms including neural network, support vector machine, random forest, gradient
boosting, k-means clustering, and other supervised or unsupervised learning methods.

pip install scikit-learn

SciPy
SciPy is a general-purpose package for mathematics, science, and engineering and extends
the base capabilities of NumPy.

pip install scipy


VIGNESH LS 5
DATA SCIENCE LABORATORY

Seaborn
Seaborn is built on matplotlib, and produces detailed plots in few lines of code.

pip install seaborn

Statsmodels
Statsmodels is a package for exploring data, estimating statistical models, and performing
statistical tests. It include descriptive statistics, statistical tests, plotting functions, and result
statistics.

pip install statsmodels

TensorFlow
TensorFlow is an open source machine learning platform with particular focus on training
and inference of deep neural networks. Development is led by the Google Brain team.

pip install tensorflow

VIGNESH LS 6
DATA SCIENCE LABORATORY

Working with Numpy arrays


CREATE A NUMPY NDARRAY OBJECT
NumPy is used to work with arrays. The array object in NumPy is called ndarray.

Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))

To create an ndarray, we can pass a list, tuple or any array-like object into the array()
method, and it will be converted into an ndarray:

Example
Use a tuple to create a NumPy array:
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)

Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

Example
Create a 0-D array with value 42
import numpy as np
arr = np.array(42)
print(arr)

1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.

Example
Create a 1-D array containing the values 1,2,3,4,5:

VIGNESH LS 7
DATA SCIENCE LABORATORY

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)

2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array. These are often used to
represent matrix or 2nd order tensors.

Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.

Example
Create a 3-D array with two 2-D arrays, both containing two arrays with the values
1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

Check Number of Dimensions?


NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.

Example
Check how many dimensions the arrays have:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)

VIGNESH LS 8
DATA SCIENCE LABORATORY

print(b.ndim)
print(c.ndim)
print(d.ndim)

Higher Dimensional Arrays


An array can have any number of dimensions.
When the array is created, you can define the number of dimensions by using the ndmin
argument.

Example
Create an array with 5 dimensions and verify that it has 5 dimensions:
import numpy as np
arr = np.array([1, 2, 3, 4], ndmin=5)
print(arr)
print('number of dimensions :', arr.ndim)

In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element
that is the vector, the 3rd dim has 1 element that is the matrix with the vector, the 2nd dim
has 1 element that is 3D array and 1st dim has 1 element that is a 4D array.

VIGNESH LS 9
DATA SCIENCE LABORATORY

Working with Pandas data frames


A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table
with rows and columns.

Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

Result
calories duration
0 420 50
1 380 40
2 390 45

Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)

Example
Return row 0:
#refer to the row index:

print(df.loc[0])

VIGNESH LS 10
DATA SCIENCE LABORATORY

Result
calories 420
duration 50
Name: 0, dtype: int64
Note: This example returns a Pandas Series.

Example
Return row 0 and 1:
#use a list of indexes:
print(df.loc[[0, 1]])

Result
calories duration
0 420 50
1 380 40
Note: When using [], the result is a Pandas DataFrame.

Named Indexes
With the index argument, you can name your own indexes.

Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)

VIGNESH LS 11
DATA SCIENCE LABORATORY

Result
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes


Use the named index in the loc attribute to return the specified row(s).

Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])

Result
calories 380
duration 40
Name: 0, dtype: int64

Load Files Into a DataFrame


If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example
Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df)

VIGNESH LS 12

You might also like