Lab - Manual FDS
Lab - Manual FDS
DATA SCIENCE
LAB MANUAL
(Under Revision)
VIGNESH LS 1
DATA SCIENCE LABORATORY
VIGNESH LS 2
DATA SCIENCE LABORATORY
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python.
VIGNESH LS 3
DATA SCIENCE LABORATORY
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages.
Aim
To Download and install python and its packages using pip installation
Procedure
Install Python Data Science Packages
Python is a high-level and general-purpose programming language with data science and
machine learning packages. Use the video below to install on Windows, MacOS, or Linux. As
a first step, install Python for Windows, MacOS, or Linux.
Python Packages
The power of Python is in the packages that are available either through the pip or conda
package managers. This page is an overview of some of the best packages for machine
learning and data science and how to install them.
We will explore the Python packages that are commonly used for data science and machine
learning. You may need to install the packages from the terminal, Anaconda prompt,
command prompt, or from the Jupyter Notebook. If you have multiple versions of Python or
have specific dependencies then use an environment manager such as pyenv. For most users,
a single installation is typically sufficient. The Python package manager pip has all of the
packages (such as gekko) that we need for this course. If there is an administrative access
error, install to the local profile with the --user flag.
Gekko
Gekko provides an interface to gradient-based solvers for machine learning and
optimization of mixed-integer, differential algebraic equations, and time series models.
Gekko provides exact first and second derivatives through automatic differentiation and
discretization with simultaneous or sequential methods.
Keras
Keras provides an interface for artificial neural networks. Keras acts as an interface for the
TensorFlow library. Other backend packages were supported until version 2.4. TensorFlow
is now the only backend and is installed separately with pip install tensorflow.
VIGNESH LS 4
DATA SCIENCE LABORATORY
Matplotlib
The package matplotlib generates plots in Python.
Numpy
Numpy is a numerical computing package for mathematics, science, and engineering. Many
data science packages use Numpy as a dependency.
OpenCV
OpenCV (Open Source Computer Vision Library) is a package for real-time computer vision
and developed with support from Intel Research.
Pandas
Pandas visualizes and manipulates data tables. There are many functions that allow efficient
manipulation for the preliminary steps of data analysis problems.
Plotly
Plotly renders interactive plots with HTML and JavaScript. Plotly Express is included with
Plotly.
PyTorch
PyTorch enables deep learning, computer vision, and natural language processing.
Development is led by Facebook's AI Research lab (FAIR).
Scikit-Learn
Scikit-Learn (or sklearn) includes a wide variety of classification, regression and clustering
algorithms including neural network, support vector machine, random forest, gradient
boosting, k-means clustering, and other supervised or unsupervised learning methods.
SciPy
SciPy is a general-purpose package for mathematics, science, and engineering and extends
the base capabilities of NumPy.
Seaborn
Seaborn is built on matplotlib, and produces detailed plots in few lines of code.
Statsmodels
Statsmodels is a package for exploring data, estimating statistical models, and performing
statistical tests. It include descriptive statistics, statistical tests, plotting functions, and result
statistics.
TensorFlow
TensorFlow is an open source machine learning platform with particular focus on training
and inference of deep neural networks. Development is led by the Google Brain team.
VIGNESH LS 6
DATA SCIENCE LABORATORY
Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
To create an ndarray, we can pass a list, tuple or any array-like object into the array()
method, and it will be converted into an ndarray:
Example
Use a tuple to create a NumPy array:
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
Create a 0-D array with value 42
import numpy as np
arr = np.array(42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Example
Create a 1-D array containing the values 1,2,3,4,5:
VIGNESH LS 7
DATA SCIENCE LABORATORY
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array. These are often used to
represent matrix or 2nd order tensors.
Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
Example
Create a 3-D array with two 2-D arrays, both containing two arrays with the values
1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Example
Check how many dimensions the arrays have:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
VIGNESH LS 8
DATA SCIENCE LABORATORY
print(b.ndim)
print(c.ndim)
print(d.ndim)
Example
Create an array with 5 dimensions and verify that it has 5 dimensions:
import numpy as np
arr = np.array([1, 2, 3, 4], ndmin=5)
print(arr)
print('number of dimensions :', arr.ndim)
In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element
that is the vector, the 3rd dim has 1 element that is the matrix with the vector, the 2nd dim
has 1 element that is 3D array and 1st dim has 1 element that is a 4D array.
VIGNESH LS 9
DATA SCIENCE LABORATORY
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Result
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
#refer to the row index:
print(df.loc[0])
VIGNESH LS 10
DATA SCIENCE LABORATORY
Result
calories 420
duration 50
Name: 0, dtype: int64
Note: This example returns a Pandas Series.
Example
Return row 0 and 1:
#use a list of indexes:
print(df.loc[[0, 1]])
Result
calories duration
0 420 50
1 380 40
Note: When using [], the result is a Pandas DataFrame.
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
VIGNESH LS 11
DATA SCIENCE LABORATORY
Result
calories duration
day1 420 50
day2 380 40
day3 390 45
Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Result
calories 380
duration 40
Name: 0, dtype: int64
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
VIGNESH LS 12