0% found this document useful (0 votes)
3 views

Machine Learning With Python-Python EcoSystem

The document provides a comprehensive guide on installing Python and setting up the Python ecosystem for machine learning, including options for individual installation and using Anaconda. It discusses essential libraries such as NumPy, Pandas, and Scikit-learn, along with the Jupyter Notebook environment, highlighting their functionalities and installation processes. The document emphasizes the importance of these tools in data science and machine learning applications.

Uploaded by

Kalighat Okira
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Machine Learning With Python-Python EcoSystem

The document provides a comprehensive guide on installing Python and setting up the Python ecosystem for machine learning, including options for individual installation and using Anaconda. It discusses essential libraries such as NumPy, Pandas, and Scikit-learn, along with the Jupyter Notebook environment, highlighting their functionalities and installation processes. The document emphasizes the importance of these tools in data science and machine learning applications.

Uploaded by

Kalighat Okira
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine Learning with Python

Python EcoSystem

Prof. Shibdas Dutta,


Associate Professor,
DCG DATA CORE SYSTEMS INDIA PVT LTD
Kolkata

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Installing Python
For working in Python, we must first have to install it. You can
perform the installation of Python in any of the following two
ways:
• Installing Python individually
• Using Pre-packaged Python distribution: Anaconda
Let us discuss these each in detail.
Installing Python Individually
If you want to install Python on your computer, then then you
need to download only the binary code applicable for your
platform. Python distribution is available for Windows, Linux
and Mac platforms.
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
On Windows platform

With the help of following steps, we can install Python on Windows platform:

· First, go to https://www.python.org/downloads/.
· Next, click on the link for Windows installer python-XYZ.msi file.
Here XYZ is the version we wish to install.
· Now, we must run the file that is downloaded. It will take us to the
Python install wizard, which is easy to use. Now, accept the default
settings and wait until the install is finished.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Using Pre-packaged Python Distribution: Anaconda
Anaconda is a packaged compilation of Python which have all the libraries widely used in Data science. We can
follow the following steps to setup Python environment using Anaconda:

Step1: First, we need to download the required installation package from Anaconda distribution. The link for
the same is https://www.anaconda.com/distribution/. You can choose from Windows, Mac and Linux OS as per
your requirement.

Step2: Next, select the Python version you want to install on your machine. The latest Python version is 3.7.
There you will get the options for 64-bit and 32-bit Graphical installer both.

Step3: After selecting the OS and Python version, it will download the Anaconda installer on your computer.
Now, double click the file and the installer will install Anaconda package.

Step4: For checking whether it is installed or not, open a command prompt and type Python as follows:

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Why Python for Data
ExtensiveScience?
set of packages
Python has an extensive and powerful set of packages which are ready to be used
in various domains. It also has packages like numpy, scipy, pandas, scikit-learn
etc. which are required for machine learning and data science.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Components of Python ML Ecosystem
In this section, let us discuss some core Data Science libraries that form the components of
Python Machine learning ecosystem. These useful components make Python an important
language for Data Science. Though there are many such components, let us discuss some of
the importance components of Python ecosystem here:

Jupyter Notebook
Jupyter notebooks basically provides an interactive computational environment for developing
Python based Data Science applications. They are formerly known as ipython notebooks. The
following are some of the features of Jupyter notebooks that makes it one of the best
components of Python ML ecosystem:
· Jupyter notebooks can illustrate the analysis process step by step by arranging the stuff like
code, images, text, output etc. in a step by step manner.

· It helps a data scientist to document the thought process while developing the analysis
process.

· One can also capture the result as the part of the notebook.

· With the help of jupyter notebooks, we can share our work with a peer also.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Installation and Execution
If you are using Anaconda distribution, then you need not install jupyter
notebook separately as it is already installed with it. You just need to go to
Anaconda Prompt and type the following command:
C:\>jupyter notebook

After pressing enter, it will start a notebook server at localhost:8888 of your computer. It is
shown in the following screen shot:

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Now, after clicking the New tab, you will get a list of options. Select Python 3 and it will take you
to the new notebook for start working in it. You will get a glimpse of it in the following
screenshots:

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


On the other hand, if you are using standard Python distribution then jupyter
notebook can be installed using popular python package installer, pip.

pip install jupyter

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Types of Cells in Jupyter Notebook
The following are the three types of cells in a jupyter notebook:

Code cells: As the name suggests, we can use these cells to write code. After writing the
code/content, it will send it to the kernel that is associated with the notebook.

Markdown cells: We can use these cells for notating the computation process. They can
contain the stuff like text, images, Latex equations, HTML tags etc.

Raw cells: The text written in them is displayed as it is. These cells are basically used to add
the text that we do not wish to be converted by the automatic conversion mechanism of
jupyter notebook.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


NumPy
It is another useful component that makes Python as one of the favorite
languages for Data Science. It basically stands for Numerical Python and
consists of multidimensional array objects. By using NumPy, we can
perform the following important operations:
· Operations associated with linear algebra.
· Mathematical and logical operations on arrays.
·We
Fourier
can alsotransformation
see NumPy as the replacement of
MatLab because NumPy is mostly used along with Scipy (Scientific Python) and Mat-plotlib (plotting library).

Installation and Execution

If you are using Anaconda distribution, then no need to install NumPy separately as it is already installed with it. You
just need to import the package into your Python script with the help of following:

On the other hand, if you are using standard Python distribution then NumPy can be
installed using popular python package installer, pip.

After installing NumPy, you can import it into your Python script as you did above.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Pandas
It is another useful Python library that makes Python one of the favorite
languages for Data Science. Pandas is basically used for data manipulation,
wrangling and analysis. It was developed by Wes McKinney in 2008. With the
help of Pandas, in data processing we can accomplish the following five
steps:
· Load
· Prepare
· Data
Manipulate
representation in Pandas
· The
Model
entire representation of data in Pandas is done with the help of following three data structures:
· Series:
Analyze It is basically a one-dimensional ndarray with an axis label which means it is like a
simple array with homogeneous data. For example, the following series is a collection of
integers 1,5,10,15,24,25…

1 5 10 15 24 25 28 36 40 89

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Data frame: It is the most useful data structure and used for
almost all kind of data representation and manipulation in
pandas.
It is basically a two-dimensional data structure which can
contain heterogeneous data.
Generally, tabular data is represented by using data frames.

For example, the following table shows the data of students


having their names and roll numbers, age and gender:
Name Rollnumber Age Gender

Aarav 1 15 Male

Harshit 2 14 Male

Kanika 3 16 Female

Mayank 4 15 Male

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Panel: It is a 3-dimensional data structure containing heterogeneous data. It is very difficult
to represent the panel in graphical representation, but it can be illustrated as a container of
DataFrame.
The following table gives us the dimension and description about above mentioned data
structures used in Pandas:
DataStructure Dimension Description

Series 1-D Size immutable, 1-D homogeneous data

DataFrames 2-D Size Mutable, Heterogeneous data in


tabular form

Panel 3-D Size-mutable array, container


ofDataFrame.

We can understand these data structures as the higher dimensional data structure
is the container of lower dimensional data structure.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Installation and Execution
If you are using Anaconda distribution, then no need to install Pandas
separately as it is already installed with it. You just need to import the package into
your Python script with the help of following:
import pandas as pd

On the other hand, if you are using standard Python distribution then Pandas can be
installed using popular python package installer, pip.
pip install Pandas

After installing Pandas, you can import it into your Python script as did above.

Example

The following is an example of creating a series from ndarray by using Pandas:

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = np.array(['g','a','u','r','a','v'])

In [4]: s= pd.Series(data)

In [5]: print(s)

0 g

1 a

2 u

3 r

4 a

5 v

dtype: object

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Scikit-learn
Another useful and most important python library for Data Science and
machine learning in Python is Scikit-learn. The following are some features of Scikit-
learn that makes it so useful:

· It is built on NumPy, SciPy, and Matplotlib.

· It is an open source and can be reused under Berkeley Software Distribution (BSD) license.

· It is accessible to everybody and can be reused in various contexts.

· Wide range of machine learning algorithms covering major areas of ML like classification,
clustering, regression, dimensionality reduction, model selection etc. can be implemented with
the help of it.

Installation and Execution


If you are using Anaconda distribution, then no need to install Scikit-learn
separately as it is already installed with it. You just need to use the package into your Python
script. For example, with following line of script we are importing dataset of breast cancer
patients from Scikit-learn:
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
from sklearn.datasets import load_breast_cancer

On the other hand, if you are using standard Python distribution and having
NumPy and
SciPy then Scikit-learn can be installed using popular python package installer,
pip.

After installing Scikit-learn, you can use it into your Python script as you have
done above.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com


Thank You

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

You might also like