0% found this document useful (0 votes)

48 views114 pages

Machine Learning With Python Supervised Learning

This document provides a guide for learning applied Machine Learning with Python. It explains how to set up the Python environment for Machine Learning projects using libraries such as SciPy, NumPy, Pandas, Matplotlib, and scikit-learn. It also covers how to configure the integrated development environments Spyder and Jupyter Notebook to work with these projects.

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views114 pages

Machine Learning With Python Supervised Learning

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

Machine Learning with Python

Supervised Learning
Downloaded from: www.detodopython.com

Ligdi González

2019
INDEX
Chapter 1
INTRODUCTION
MACHINE LEARNING WITH PYTHON
Chapter 2
PREPARAR THE PYTHON ENVIRONMENT
PYTHON
LIBRARIES FOR DATA SCIENCE
LIBRARIESPATO DATA VISUALIZATION
LIBRARIESPARA MACHINE LEARNING
INSTALL THE PYTHON ENVIRONMENT
WORKING WITH THE SPYDER IDE
WORKING WITH JUPYTER NOTEBOOK IDE
Chapter 3
INTENSIVE COURSES IN PYTHON AND SCIPY
INTENSIVE PYTHON COURSE
INTENSIVE NUMPY COURSE
INTENSIVE COURSE OFPANDAS
INTENSIVE MATPLOTLIB COURSE
Chapter 4
LOAD THE DATOS
CONSIDERATIONS OF CSV FILES
DATASET: PIMA INDIANS DIABETES
LOAD CSV FILES USING PYTHON
UPLOAD FILES USING PYTHON
Chapter 5
UNDERSTANDING THE DATOS
CHECK THE DATA
DIMENSIONING THE DATA
DATA TYPES
STATISTICAL DESCRIPTION
CLASS DISTRIBUTION (CLASSIFICATION PROBLEMS)
CORRELATION BETWEEN CHARACTERISTICS
ADDITIONAL CONSIDERATIONS
Chapter 6
VISUALIZING THE DATOS
GRAPHING WITH A SINGLE VARIABLE
GRAPHING WITH MULTIPLE VARIABLES
Chapter 7
PROCESSING OF THE DATOS
SEPADATA RATION
STANDARDIZATION OF DATA
DATA NORMALIZATION
COLUMN REMOVAL
Chapter 8
SELECTING FEATURES
IMPORTANCE OF CHARACTERISTICS
FILTER METHODS
WRAPPING METHODS
INTEGRATED METHODS
Chapter 9
CLASSIFICATION ALGORITHMS
LOGISTIC REGRESSION
K - NEAREST NEIGHBORS
SUPPORT VECTOR MACHINES
NAIVE BAYES
DECISION TREES CLASSIFICATION
RANDOM FORESTS CLASSIFICATION
Chapter 10
PERFORMANCE METRICS FOR CLASSIFICATION ALGORITHMS
CONFUSION MATRIX
CLASSIFICATION REPORT
AREA UNDER THE CURVE
Chapter 11
REGRESSION ALGORITHMS
LINEAR REGRESSION
POLYNOMIAL REGRESSION
SUPPORT VECTORS REGRESSION
REGRESSION DECISION TREES
RANDOM FOREST REGRESSION
Chapter 12
PERFORMANCE METRICS FOR REGRESSION ALGORITHMS
ROOT MEAN SQUARE ERROR (RMSE)
MEAN ABSOLUTE ERROR (MAE)
Chapter 13
MACHINE LEARNING PROJECT - CLASSIFICATION
DEFINITION OF THE PROBLEM
IMPORT THE LIBRARIES
LOAD THE DATASET
UNDERSTANDING THE DATA
VISUALIZING THE DATA
SEPADATA RATIONING
SELECTING FEATURES
DATA PROCESSING
SEPADATA RATIONING
APPLICATION OF CLASSIFICATION ALGORITHMS
Chapter 14
MACHINE LEARNING PROJECT - REGRESSION
DEFINITION OF THE PROBLEM
IMPORT THE LIBRARIES
LOAD THE DATASET
UNDERSTANDING THE DATA
VISUALIZING THE DATA
SEPADATA RATION
SEPARATION OF THE DATA
APPLICATION OF REGRESSION ALGORITHMS
Chapter 15
CONTINUE LEARNING MACHINE LEARNING
BUILD TEMPLATESPATO PROJECTS
DATA SETSPATO PRACTICE
Chapter 16
GET MORE INFORMATION
GENERAL COUNCIL
HELP WITH PYTHON
HELP WITH SCIPY AND NUMPY
HELP WITH MATPLOTLIB
HELP WITHPANDAS
HELP WITH SCIKIT-LEARN
Chapter 1
INTRODUCTION

This book is a guide to learning applied Machine Learning with

Python. The process that can be used will be uncovered, step by step,
start in Machine Learning with the Python ecosystem.

MACHINE LEARNING WITH PYTHON

This book focuses on Machine Learning projects with the language
of Python programming. It is written to explain each of the steps
to develop a Machine Learning project with algorithms of
classification and regression. It is divided as follows:

Learn how the subtasks of a Machine project

Learning is assigned to Python and the best way to work on
through each task.
Finally, all the explained knowledge is joined together to
develop a classification problem

In the first chapters, you will learn how to complete the subtasks.
specific to a Machine Learning project using Python. Once
to learn how to complete a task using the platform and
to obtain a reliable result that can be used over and over again
project after project.

A Machine Learning project can be divided into 4 tasks:

Define the problem: investigate and characterize the problem

to better understand the objectives of the project.
Analyze the data: use descriptive statistics and
visualize them to better understand the data that is available
available.
Prepare the data: use data transformations
to better explain the structure of the prediction problem to the
Machine Learning algorithms.
Evaluate the algorithms: design tests to evaluate one
standard series of algorithms on the data and select the best ones
to investigate further.

In the last part of the book, it will bring together everything learned previously to
the development of a project, one with classification algorithms and the other with
regression algorithms.

Each of the lessons is designed to be read from beginning to end.

in order, and show exactly how to complete each task in a project
of Machine Learning. Of course, one can dedicate themselves to chapters
specifically, later, to refresh the knowledge. The chapters
are structured to demonstrate the libraries and functions and show
specific techniques for a Machine Learning task.

Each chapter is designed to be completed in less than 30 minutes.

Depending on your skill level and enthusiasm. It is possible to work.
with the whole book in one weekend. It also works, if you want,
dive into specific chapters and use the book as a reference.
Chapter 2
PREPARE THE PYTHON ENVIRONMENT

Downloaded from: www.detodopython.com

Python is growing and can become the dominant platform for

Machine Learning, the main reason to adopt this language of
programming is a general-purpose language that can be used for both
research and development, as well as in production.

In this chapter, you will learn everything related to the Python environment.
for Machine Learning:

Python and its use for Machine Learning.

2. The basic libraries for Machine Learning such as
SciPy, NumPy and Pandas.
3. The library for data visualization such as
matplotlib.
4. The library to implement Machine algorithms
Learning, scikit-learn.
5. Set up the Python environment using Anaconda.
6. Set up the Spyder IDE.
7. Set up the Jupyter Notebook IDE.

PYTHON
Python leads in machine learning development languages due to
its simplicity and ease of learning. Python is used by more and
more data scientists and developers for construction and analysis of
models. In addition, it is a success among beginners who are new to
Machine Learning.
Downloaded from: www.detodopython.com

Unlike other programming languages for Machine

Learning, like R or MATLAB, data processing and expressions
scientific mathematics are not integrated into the language itself, but the
libraries like SciPy, NumPy, and Pandas offer functionality
equivalent in possibly more accessible syntax.

Specialized Machine Learning libraries like scikit-learn,

Theano and TensorFlow provide you with the ability to train a variety of
Machine Learning models that potentially use infrastructure
distributed computing.

LIBRARIES FOR DATA SCIENCE

These are the basic libraries that transform Python from a language of
general-purpose programming in a powerful and robust tool
for data analysis and visualization, these are the foundations on which
they are based on the most specialized tools.

SciPy

SciPy is a software library for engineering and science, SciPy

includes functions for linear algebra, optimization, integration and
statistics. Provide efficient numerical routines such as integration
numeric, optimization, and many others through its submodules
specific. The main functionality of this library is based on NumPy and
its matrices and adding a collection of high-level algorithms and commands
level to manipulate and visualize data.

NumPy

NumPy stands for Numerical Python and is a fundamental library

for scientific computing in Python as it provides vectorization
of mathematical operations in the type of matrices, improving the
performance and, consequently, accelerates execution.

It is focused on managing and treating data as matrices, its

the purpose is to provide the ability to perform complex operations of
matrix that are required by neural networks and complex statistics of
easy way.

In summary, NumPy presents objects for arrays and matrices

multidimensional, as well as routines that allow developers
perform advanced mathematical and statistical functions on those matrices
with the least amount of code possible.
NumPy is a data management library that is usually
paired with TensorFlow, SciPy, matplotlib, and many other libraries of
Python oriented towards Machine Learning and data science.

Pandas

Pandas is a Python library designed to work with data

"labeled" and "relational" in a simple and intuitive way, it is designed
for quick and easy data manipulation, aggregation, and visualization.
Pandas adds data structures and tools that are designed to
the analysis of practical data in finance, statistics, and engineering. Pandas
works well with incomplete, unordered, and unlabeled data, it is
to say, the type of data that is likely to be found in the real world, and
provides tools to configure, merge, reshape, and split
data sets. With this library, you can add and remove columns.
easily from the DataFrame, convert data structures into objects and
handling missing data (NaN).

LIBRARIES FOR DATA VISUALIZATION

The best and most sophisticated analysis makes no sense if you can't
communicate it to other people. These libraries allow you to easily create
more visually appealing and sophisticated graphs, tables, and maps, without
import what type of analysis you are trying to do.

Matplotlib

It is a standard Python library for creating diagrams and graphs in

2D, it is very low level, which means it requires more commands to
generate nice graphs and figures with some additional libraries
advanced, however, the other side of that is flexibility, with
with sufficient commands, you can make almost any type of graphs that
I want to create with matplotlib.

LIBRARIES FOR MACHINE LEARNING

Machine Learning is at the intersection of Intelligence
Artificial and statistical analysis. When training computers with datasets
From real-world data, algorithms can be created that make predictions.
more precise and sophisticated, whether it is about obtaining better
directions to drive or build computers that can identify
milestones simply by observing images.

Scikit-learn

The scikit-learn library is definitely one of the most popular.

of Machine Learning, it has a large number of features for the
data mining and data analysis, so it is an excellent option
both for researchers and for developers. It is built on
the popular libraries NumPy, SciPy, and matplotlib, so you will have a
familiar feeling when using it.

Scikit-learn exposes a concise and consistent interface for the

common Machine Learning algorithms, so it is easy to carry to the
production systems. The library combines quality code and good
documentation, ease of use and high performance, and it is a standard of the
de facto industry for Machine Learning with Python.

For more information, you can check the following

information

The provided text is a URL and cannot be translated.

INSTALL THE PYTHON ENVIRONMENT

There are multiple ways to install the Python environment, IDEs, and
libraries to be used in Machine Learning, but the easiest way,
especially if you are just learning and don't know how to work very well
in the command terminal, it is using the Anaconda environment.

You just have to go to the official page, anaconda.com and

press the 'Downloads' button.

There select the operating system that the computer uses.

Then select the version of Python to use, it is recommended
select version 3.6 since most Machine libraries
Learning is developed for this version, while the other one is
becoming more and more obsolete for its use every day.

Once it has been downloaded, proceed to install it on the computer.

one must have a little patience because sometimes this whole process
it takes time.

At the end of all this, Python will be installed on the computer, great
part of the libraries to implement in Machine Learning and even several
IDEs that can be used for project development.

WORKING WITH THE SPYDER IDE

Spyder can be found in the Anaconda browser 'Anaconda-
Navigator.
To open Spyder, you just need to press "Launch" on the
application box.

Once you start Spyder, you should see an open editor window.
the left side and a Python console window at the bottom
right. For its part, the panel located in the upper right is used
a variable explorer, a file explorer, and a browser of
help. Like most IDEs, you can change which panels
they are visible and their design within the window.
You can start working with Spyder immediately in the window.
from the console. By default, Spyder provides a console
IPython that can be used to interact directly with the engine
Python works essentially the same way as it does in the
command line, the big difference is that Spyder can inspect the
Python engine contents and can do other things such as display
variables and their contents within the variable explorer.

For more information, you can check the following.

information:

Invalid input. Please provide text for translation.

WORKING WITH JUPYTER NOTEBOOK IDE

You can find Jupyter Notebook in the Anaconda browser.
Anaconda-Navigator, just like Spyder.
Once selected to open Jupyter, a tab opens in the
default web browser of the computer. From here you can
navigate through the computer folders to determine where
the project document will be saved. A new Notebook must be created
in Python 3, a new tab opens at this moment, here is where you
He must start developing the project.

For the purposes of the book, all lines of code and results of
The codes will be made using Jupyter Notebook as the IDE, this is because
the presentation of the data is ideal to facilitate understanding, but
You can select any IDE of your preference.
Chapter 3
INTENSIVE COURSES IN PYTHON AND SCIPY

To develop Machine Learning algorithms in Python, it is not necessary

to be an expert Python developer, just having the basics of how
programming in any programming language can enable one to
understand Python very easily, you just need to know some
properties of the language and transfer what you already really know to Python.

In this chapter you will learn:

1. Basic syntax of Python.

2. Basic knowledge of library programming
NumPy, Pandas, and Matplotlib.
3. The foundations for building tasks in Python for Machine
Learning.

If you already know something about Python, this chapter will be a refresher.
knowledge.

INTENSIVE PYTHON COURSE

Python is a powerful computational tool that can be used
to solve complicated tasks in the area of finance, economics, science of
data and Machine Learning.

The technical advantages that Python offers compared to others

the programming languages are the following:

It is free and constantly updated.

It can be used in multiple domains.
It does not require much time to process calculations and has a
intuitive syntax that allows for programming complex lines of code.
With all these features, Python can be
implementing in countless applications.

Therefore, Python's popularity is based on two pillars, the first is that

it is easy to learn since it has a clear and intuitive syntax and the
the second reason is that it is very powerful as it can execute a variety
of complex lines of code.

For more information, you can review the following

information:

Unable to translate the given URL.

INTENSIVE NUMPY COURSE

NumPy is a Python package that stands for 'Numerical Python', it is the
main library for scientific computing, provides a structure of
matrix data that has some benefits over regular lists
Python.

On its part, NumPy array or the NumPy array is a

powerful N-dimensional array object that has the shape of rows and columns,
in which there are several elements that are stored in their respective
memory locations. In the figure, this can be observed more
clearly, this is a two-dimensional matrix because it has rows and columns.
As can be seen, it has four rows and three columns, in the case that only
if it had a row then it would have been a one-dimensional matrix.
According to the explanation above, in the first figure there is a
one-dimensional matrix or 1D. In the second figure, there is a matrix
bidimensional or 2D, where the rows are indicated as axis 0, while
The columns are axis 1.

The number of axes increases according to the number of dimensions, in

3D matrices will have an additional axis 2. These axes are only valid for
matrices that have at least two dimensions, since this doesn't make sense
for one-dimensional matrices.

Downloaded from: www.detodopython.com

Among the differences that NumPy has and the lists it offers
Python for handling data is found that, NumPy takes less.
memory compared to Python lists, is also quite fast.
in terms of execution and it does not require much effort in the
programming to execute the routines. For these reasons, NumPy is much
easier and more convenient to use in the development of Machine algorithms
Learning.

Knowing all this, two matrices are created: one unidimensional and the other
bidimensional, this using of course NumPy.
With the first instruction, the Python program is told that from now on
from now on np will be the reference for everything related to NumPy.
In a very simple way, matrices are declared within NumPy.

NumPy contains several instructions to obtain more information.

about the matrices that have been created using this library, some of them
the following are:

The data type of the elements that are can be found

stored in an array, for this the 'dtype' function is used,
will throw the data type along with the size.

In the same way, one can find the size and shape of the matrix.
with the functions 'size' and 'shape', respectively.

Downloaded from: www.detodopython.com

Another operation that can be performed with NumPy is the change.
of size and shape of the matrices.
The change in the shape of the
matrices is when the number of rows and columns is changed that gives it a
new view of an object.

In the image, there are 3 columns and 2 rows that have been converted into 2.
columns and 3 rows using the "reshape" instruction.

Another function that can be used with NumPy is to select a single

matrix element, for this you only need to specify the number
of the column and row where it is located and will return the stored value in
that location.

With the NumPy library, it can be done very easily.

arithmetic operations. You can find the minimum and maximum value and the
sum of the matrix with NumPy, for this we use 'min', 'max' and 'sum',
respectively.
More operations can be performed with NumPy arrays, such as
the addition, subtraction, multiplication, and division of the two matrices.

Downloaded from: www.detodopython.com

A large part of the functions that can be used has been covered.
time to manipulate the data with NumPy, there are still many more to come
instructions, but as it is being programmed, they will surely be added
learning, remember that the official website of this library offers very
Good information about all the functions it offers, which is why it is not
to visit her.

For more information, you can review the following

information

The provided text is a URL and does not contain translatable content.
Invalid input. Please provide a text for translation.

INTENSIVE COURSE DEPANDAS

Pandas is an open-source Python library that provides
high-performance data analysis and manipulation tools
using its powerful data structures. The name Pandas is derived
of the term 'Panel Data' and it is the data analysis library of Python.
Using this library, one can achieve five typical steps in the
data processing and analysis, regardless of the source of the
load, prepare, manipulate, model, and analyze.

A DataFrame is the fundamental structure of Pandas; these are

two-dimensional labeled data structures with column types
potentially different. The Pandas DataFrames consist of three
main components: the data, the index, and the columns.

Downloaded from: www.detodopython.com

Additionally, with the Pandas DataFrame structure, one can specify

the names of the index and column. The index indicates the difference in the rows,
while the column names indicate the difference in the
columns. These components are very useful when manipulation is required.
the data.

The main features of the Pandas library are:

☐ Fast and efficient DataFrame object with indexing

default and custom.
Tools for loading data into in-memory data objects
from different file formats.
Data alignment and integrated handling of missing data.

Renovation and change of date sets.

Labeling, cutting, indexing, and subset of large sets
of data.
The columns of a data structure can be deleted or
insert.
Group by data for aggregation and transformations.
High performance in data fusion and integration.
Time series functionality.
Like any other library in Python, it must be imported within the
program, in this case the alias pd is usually used, while for the library
NumPy is loaded as np.

Creating DataFrames is the first step in any project of

Machine Learning with Python, so to create a data plot
from a NumPy array it should be passed to the DataFrame() function in the
data argument

If you observe the result of this code, the elements are fragmented.
selected from the NumPy matrix to build the DataFrame, first you
select the values that appear in the lists that start with Row1 and
Row2, then select the index or the row numbers Row1 and Row2 and then
the names of the columns Col1 and Col2.

The way this DataFrame is created will be the same for all
structures.

Once the DataFrame is created, it can be explored with all the

instructions that Pandas has. The first thing to do is
to know the shape of the data, the shape instruction is used. With
this instruction can learn the dimensions of the DataFrame, that is the
width and height.
On the other hand, the len() function can be used in combination with the
Index instruction to know the height of the DataFrame.

For more information, you can review the following

information

Invalid input. Please provide text for translation.

http://bit.ly/2NFsYCa

INTENSIVE COURSE ON MATPLOTLIB

Matplotlib is a plotting library used for 2D graphs in
The Python programming language is very flexible and has many values.
built-in defaults that help a lot at work. Like
Well, it doesn't take much to get started, you just have to do the...
necessary imports, prepare some data and with this it can be
start plotting the function with the help of the plot() instruction.
When graphing, the following should be taken into account
considerations:

The figure is the window or general page where everything is drawn, it is

the top-level component of everything that will be considered in the
Next points. It is possible to create multiple independent figures.
A figure can have other things such as a subtitle,
What is a centered figure title, a legend, a bar of
color, among others.
The axes are added to the figure. The axes are the area in which it is
plot() and scatter() functions are used to graph the data and that
They can have associated tags. The figures can contain
multiple axes.
Each axis has an x-axis and a y-axis, and each of them contains a
numbering. There are also the axis labels, the title and the
legend that should be taken into account when one wants to personalize
the axes, but also taking into account that the scales of the axes
and the grid lines can be useful.
The spinal lines are lines that connect the axis marks.
that designate the boundaries of the data area, in other words, they are the
simple square that you can see when you have initialized the axes,
like in the following image:

As can be observed, the right and upper ends are

configured as invisible.

For more information, you can review the following

information
Invalid input. Please provide a text for translation.
Invalid input. Please provide a text to translate.

Downloaded from: www.detodopython.com

Chapter 4
LOAD THE DATA

The first step when starting a Machine Learning project is to

load the data. There are many ways to do this, but the first thing that
what should be seen is the format in which this data is found, generally it
they are found in CSV files, so here it will be explained how to load a
CSV file in Python, although it will also show how to do it in case
that you have another format.

In this chapter you will learn:

1. Load CSV files using Python.

2. Upload files using Python.

CONSIDERATIONS OF CSV FILES

Thereareseveralconsiderationsthatmustbetakenintoaccountatthetime
to load the data from a CSV file, the following is explained
some of them.

File header

Sometimes the data has a header, this helps in the

automatic assignment of names to each data column. In case of
do not rely on this header, its attributes can be named in a way
manual. Similarly, you must explicitly specify whether the file to
use account or not with a header.

Delimiter

The delimiter that is used as standard to separate the values in

the fields is the comma (,). The file to be used could use a delimiter
different like a blank space or semicolon (;), in these cases we
must specify explicitly.

DATASET: PIMA INDIANS DIABETES

The dataset of the Pima Indians will be used for
demonstrate the data load in this chapter. It will also be used in
many of the upcoming chapters.

The objective of this dataset is to predict whether a patient has

diabetes or not, according to certain diagnostic measurements included in the
dataset. In particular, all the patients here are women from
less than 21 years of Pima Indian heritage.

The data is available for free on the Kaggle page under

the name of 'Pima Indians Diabetes Database'. To download the file
You must be subscribed to Kaggle.

For more information, you can review the following

information

http://bit.ly/2wqG0g6

LOAD CSV FILES USING PYTHON

To load the CSV data, the Pandas library and the function will be used.
"pandas.read_csv()". This function is very flexible and is the fastest way
and easy to load data into Machine Learning. The function returns a
Pandas DataFrame that can be used immediately.

In the following example, it is assumed that the file diabetes.csv is

find saved in the current working directory.
You can also load the data directly from an address.
web, for this you just need to include the web address within the
parenthesis where the file name is currently held.

UPLOAD FILES USING PYTHON

In case the Machine Learning files are not found in
CSV format, the Pandas library offers the option to load data in different
formats. Below is a list and the instructions that should be used:

Description Command
HTML read_html
MS EXCEL read_excel
JSON read_json
SQL read_sql
Chapter 5
UNDERSTANDING THE DATA

Once the data is loaded, it must be understood in order to obtain the

better results. This chapter will explain several ways in which it
You can use Python to better understand Machine Learning data.

In this chapter you will learn:

1. Verifytherawdata.
2. Check the dimensions of the dataset.
3. Review the data types of the features.
4. Checkthedistributionoffeaturesamongtheclassesinthe
dataset.
5. Verifythedatausingdescriptivestatistics.
6. Understand the relationships in the data using
correlations.
7. Review the bias of the distributions of each feature.

To implement each of these items, the set of

data that was worked on in the previous chapter corresponding to the
diabetes of the Pima Indians.

CHECK THE DATA

Looking at the raw data can reveal information that cannot be
obtain in another way. It can also show forms that can later
becoming ideas on how to better process and manage the data for
Machine Learning tasks.

To visualize the first 5 rows of the data, the function is used

head() of Pandas DataFrame.
Result once the previous code is executed:

SCALING THE DATA

To work in Machine Learning, one must have a good control over the
amount of data available, knowing the number of rows and of
columns.

If there are too many rows, some algorithms may take longer.
too much in training. Instead, if there are very few, suddenly
there are not enough data to train the algorithms.
If there are too many features, some algorithms may
to be distracted or to suffer poor performance due to the
dimensionality.

To understand the shape and size of the dataset, we use the

Pandas properties shape.

Result after running the above code:

The result of this code represents the number of rows and then the
columns. Therefore, the dataset contains 768 rows and 9
columns.
DATA TYPES
Knowing the data type of each feature is important. Knowing
This information can give you an idea if the data should be converted.
originals in other formats so that they are easier to implement
alongside machine learning algorithms.

To know the data types, the function dtypes can be implemented.

from Pandas, this way you can know the data type of each
characteristic that is part of the dataset.

Result once the above code has been executed:

The result is a list with each of the columns of the set of

data, where the type of data being handled is specified.

STATISTICAL DESCRIPTION
Statistical description refers to the information that can be
obtain data relating to statistical properties of each
characteristic. To implement it, you only need to use the function
describe() from Pandas, and the properties it returns are as follows:

Count
☐Media
Standard deviation
Minimum value
25%
50%
75%
Maximum value

Result once the above code is executed:

As can be observed, it returns a lot of information from the set of

data, but the important thing here is to verify if all the quantities of data
coincide, this way you can know if there are any missing data or not.

CLASS DISTRIBUTION (PROBLEMS OF

CLASSIFICATION
In case you are working on a classification problem,
As is the current case, it is necessary to know how balanced they are.
class values. When problems are highly encountered
unbalanced (many more observations for one class than for another) are
they require special treatment when processing
from the data. To find out this information, a
Pandas function, groupby along with the Class column of the set.
data being worked on.

Result after executing the above code:

Class 0 corresponds to non-diabetic people while class
1 corresponds to the women who have the disease.

CORRELATION BETWEEN CHARACTERISTICS

Correlation refers to the relationship between two variables and how they may
or not to change together. To implement this, the method must be indicated.
to use, the most common is the Pearson correlation coefficient, which assumes
a normal distribution of the involved attributes. If the correlation is
finding between -1 or 1 shows a complete negative or positive correlation,
respectively. While a value of 0 shows no correlation
not at all. To calculate a correlation matrix, the function is used

Result once the above code is executed:

Some Machine Learning algorithms, such as linear regression and

Logistics may suffer from poor performance if there are attributes.
highly correlated in the dataset.

ADDITIONAL CONSIDERATIONS
The points discussed here are just some advice that should be
consider at the time of reviewing the data, in addition to these, it must be
take the following into consideration:

☐Review the numbers. Generating the statistical description is not

sufficient. One should take time to read and understand very well
the numbers being seen.
When looking at the numbers, it is necessary to understand very well how and why they are.
they are looking at these specific numbers, how they relate to the
domain of the problem in general, the important thing here is to make oneself
ask and know very well what the information we are about is about
present the dataset.
It is advisable to write down all the observations that are obtained or
ideas that arise when analyzing the data. At times this
information will be very important when trying to come up with ideas
new things to try.
Chapter 6
VISUALIZING THE DATA

As explained in the previous chapter, data must be understood to

achieve good results when implementing the algorithms of
Machine Learning. Another way to learn from data is by implementing the
visualization of the same.

In this chapter you will learn:

1. Graphing with a single variable

2. Graphing with multiple variables

GRAPHING WITH A SINGLE VARIABLE

Graphs with a single variable are useful for understanding each
characteristic of the dataset independently. Here it
they will explain the following:

Histograms
Box Plot Diagram

Histogram

A quick way to get an idea of the distribution of each attribute is

observe the histograms. These are used to present the distribution
and the relationships of a single variable in a set of characteristics. The
histograms group data into bins and the shape of them,
one can quickly get an idea of whether an attribute is Gaussian,
biased or even has an exponential distribution. It can even
help to determine if there are possible outliers.
Result once the previous code is executed:

Box plots

Another way to check the distribution of each attribute is to use the

box diagrams. This summarizes the distribution of each attribute, plotting
a line for the median or average value and a box around the
percentiles 25 and 75.

Result after executing the previous code:

For more information, you can review the following
information

Invalid input format. Please provide text for translation.

GRAPHING WITH MULTIPLE VARIABLES

This section provides an example of a chart that shows the

interactions between multiple variables in the dataset:

Correlation matrix

The correlation presents an indication of how related they are.

changes between two variables. Positive correlation refers to when two
Variables change in the same direction. In contrast, correlation is
negative when they change in opposite directions, one goes up and the other goes down. The
correlation matrix refers to the correlation between each pair of attributes,
In this way, it can be verified which variables have a high correlation.

Result once the previous code is executed:

If the matrix is analyzed, it is symmetric, that is, the lower part

the left side of the matrix is the same as the upper right. This is useful since
we can see two different views of the same data in a graph.

For more information, you can check the following

information:

http://bit.ly/2XgTTOg
Chapter 7
DATA PROCESSING

Data processing is a fundamental step in any

Machine Learning problem, due to the algorithms making
assumptions about the data, so they must be presented
correctly.

In this chapter, you will learn how to prepare the data for Machine
Learning in Python using scikit-learn:

1. Standardize the data

2. Normalize the data
3. Column removal

DATA SEPARATION
In this chapter, several methods for processing will be explained.
data for Machine Learning. The dataset that will be used will be the
of diabetes in the Pima Indians.

Before starting the data processing, it is advisable to separate

the dataset with the input and output variables or, as it is also known
you know them, the independent and dependent variables.

Observing the dataset that has been worked on,

The output or dependent variable would be the column 'Outcome',
because it reflects whether the person has diabetes or not.

Having defined this, we proceed to separate the data, creating the

variable 'X' for input or independent data and variable 'y'
for the column corresponding to "Outcome".
Result after executing the previous code:

Result once the previous code has been executed:

Downloaded from: www.detodopython.com

Data processing will only be done for the set

corresponding to the variable 'X', the input or independent data. The
Data from the variable 'y' will not undergo any procedure.

STANDARDIZATION OF THE DATA

Data standardization is a common requirement for many
machine learning estimators, this is because the algorithms are
they may behave badly if the individual characteristics do not resemble each other more or
less than the standard normally distributed data.

To comply with this procedure, the StandardScaler function is used.

from the scikit-learn library and is applied, in this case to all the data from the
dataset "X".
Result once the above code has been executed:

Note: when running this code there will probably be an error, the message
what it indicates is that all data of the integer and floating type have been
converted to float to be able to implement the instruction
StandardScaler.

As can be seen when running the code, it generates a matrix of the type
NumPy with the standardized data, in case they want to be converted.
data in Pandas DataFrame the following lines must be executed
additional code.

Result once the above code is executed:

For more information, you can check the following
information:

Invalid input. Please provide text for translation.

Invalid input. Please provide text to be translated.

DATA NORMALIZATION
Normalizing the data refers to changing the scale of the data.
characteristic so that they have a length of 1. This method of
processing can be useful for sparse datasets (many
zeros) with attributes of different scales.

To implement this method, the Normalizer class from the library is used.
scikit-learn.

Result after executing the above code:

Just like in the previous case, executing the code generates a matrix of
NumPy type with normalized data, in case you want to convert it.
the data in Pandas DataFrame must run the following lines of
additional codes.
Result once the previous code is executed:

For more information, you can check the following

information:

The provided text appears to be a URL and not translatable content.

The provided text is a URL and cannot be translated.

COLUMN REMOVAL
Another way to process the data is by removing columns with data that
they are not necessary for the analysis. The function to be used to fulfill this
form esdrop, along with the name of the column to be deleted.

To conduct the respective test, the column will be removed.

"BloodPressure" of the dataset being worked on.

Result once the previous code is executed:

For more information, you can review the following
information

Invalid input. Please provide the text you want to be translated.

Invalid input for translation. Please provide text for translation.
Chapter 8
SELECTING FEATURES

The characteristics of data that are used to train the models of

Machine Learning has a great influence on the performance that can
achieve. Irrelevant or partially relevant characteristics may
negatively affect the performance of the model. In this chapter, it is
they will describe the automatic feature selection techniques that
they can be used to prepare the data.

In this chapter you will learn:

1. Filter method
2. Wrapping method
3. Integrated methods

IMPORTANCE OF THE CHARACTERISTICS

Datasets can sometimes be small while
others are extremely large in size, especially when they have
with a large number of features, making them very difficult to
process.

When you have this type of high-dimensional datasets and

they use all of them for the creation of Machine Learning models
to cause

Additional features act as noise for which the

A machine learning model can have a performance
extremely low.
The model takes longer to train.
☐ Assignment of unnecessary resources for these features.
For all this, feature selection must be implemented in the
Machine Learning projects.

FILTER METHODS
The following image better describes the selection methods of
features based on filters:

Filtering methods are generally used as a step of

data processing, feature selection is independent of
any Machine Learning algorithm.

The characteristics are classified according to the statistical scores that

they tend to determine the correlation of the characteristics with the variable of
As a result, correlation is a very contextual term and varies from one work to another.
to another.

The following table can be used to define the coefficients of

correlation for different types of data, in this case, continuous and
categorical.

Pearson correlation: it is used as a measure to quantify the

linear dependence between two continuous variables X and Y, its value varies from -1 to
+1.

LDA: linear discriminant analysis is used to find a

linear combination of features that characterizes or separates two or more
classes, or levels, of a categorical variable.

ANOVA means analysis of variance and is similar to LDA, except

due to the fact that it operates through one or more independent functions
categorical and a continuous dependent function. Provide a proof
statistic on whether the means of various groups are equal or not.

Chi-squared: it is a statistical test that is applied to groups of

categorical characteristics to evaluate the probability of correlation or
association between them using their frequency distribution.

Filter methods do not eliminate multicollinearity, therefore, they

You must also deal with them before training models for your data.

Practically, what was explained above will be implemented in a

Chi-square statistical test for non-negative characteristics, for
select 5 of the best features of the dataset with which
it has been found to be related to diabetes in the Pima Indians.

The scikit-learn library provides the SelectKBest class that can be

use with a set of different statistical tests to select a
specific number of functions, in this case, is Chi-square.

Result once the previous code is executed:

The result obtained in this part of the program is the score that you have.
each of the characteristics once the Chi-square method is applied.
according to this, the most important characteristics can already be defined that
can affect the analysis with Machine Learning.
Result once the previous code has been executed:

In summary, the results obtained show the 5

chosen characteristics, taken with the highest scores: 'Pregnancies'
pregnancies
body mass

These scores help determine the best features for

train the model.

For more information, you can check the following

information:

http://bit.ly/2C0BojO

WRAPPING METHODS
Like the filtering methods, a graph is shown where
better explain this method:
As can be seen, a wrapping method requires an algorithm.
of Machine Learning and uses its performance as an evaluation criterion.
This method seeks a feature that is more suitable for the algorithm.
and aims to improve performance.

Therefore, it is about using a subset of features and they

train a model using it, based on the inferences drawn from the
the previous model, it is decided to add or remove features of its
subset. The problem essentially reduces to a problem of
search. These methods are often computationally very expensive.

Some common examples of Wrapping Methods are the following:

Forward Selection: it is an iterative method in

the one that starts without having any characteristics in the model. In each
in each iteration, the function that best improves our model continues to be added
until the addition of a new variable does not improve performance
model.

Backward Selection: it starts with all

the characteristics and the least significant feature is removed in each
iteration, which improves the performance of the model. This is repeated until
No improvement in feature elimination is observed.

Recursive Feature Elimination

Elimination) is an optimization algorithm that seeks to find the
subset of functions with better performance. Creates repeatedly
models and set aside the best or worst performance characteristic in
each iteration. Build the following model with the characteristics of the
left until all the features are exhausted, then classify them
characteristics according to the order of their elimination.

Applying this theory to the dataset that has been worked on until
At times, the method of feature elimination will be applied.
recursive along with the logistic regression algorithm.
VISIT

Python for all kinds of use, such as application development of

desktop and web, Data Science, Machine Learning, Artificial Intelligence
Deep Learning and more...

www.detodopython.com
Result once the above code is executed:

For this method, the 4 main functions were chosen and the result was
Pregnancies
(body mass index) and "DiabetesPedigreeFunction" (pedigree function of
diabetes.

As can be seen, these data are marked as

True in the selected feature matrix and as 1 in
the feature classification matrix.

For more information, you can check the following

information:

The provided text is a URL and does not contain translatable content.

INTEGRATED METHODS
Combines the qualities of filter and wrapper methods. It
implement through algorithms that have their own methods of
embedded feature selection.

Some of the most popular examples of these methods are the

LASSO and RIDGE regression, which have penalty functions
incorporated to reduce overfitting.
For more information, you can check the following
information:

The provided text is a URL and does not contain translatable content.
Chapter 9
CLASSIFICATION ALGORITHMS

When developing a Machine Learning project, one cannot know about

in advance which algorithms are the most suitable for the problem, it must be
try various methods and focus on those that prove to be
the most promising. In this chapter, it will be explained how to implement the
Machine Learning algorithms that can be used to solve
classification problems in Python with scikit-learn.

In this chapter, you will learn how to implement the following algorithms:

1. Logistic regression
2. Closest neighbors
3. Support vector machines
4. Naive Bayes
5. Decision trees classification
6. Random forests classification

For this analysis, the diabetes dataset will be used.

Pima Indians. Similarly, it is assumed that the theoretical part is known.
each Machine Learning algorithm and how to use them will not be explained
based on the parameterization of each algorithm.

LOGISTIC REGRESSION
Logistic Regression is a statistical method for predicting classes
binary. The result or target variable is dichotomous in nature.
Dichotomous means that there are only two possible classes. For example, one can
use for cancer detection problems or calculate the probability of
that an event occurs.
Downloaded from: www.detodopython.com

Logistic Regression is one of the Machine Learning algorithms.

simpler and more commonly used for binary classification. It is easy to
implement and can be used as a baseline for any problem of
binary classification. Logistic Regression describes and estimates the relationship
between a binary dependent variable and the independent variables.

Result once the above code is executed:

To implement this algorithm, it is only necessary to define it, carry out its
respective training along with the training data and a
prediction.

K - NEAREST NEIGHBORS
K nearest neighbors is a non-parametric learning algorithm,
this means that it makes no assumptions about the distribution of data
underlying. In other words, the structure of the model is determined by
starting from the dataset. This is very useful in practice where most
real world datasets do not follow theoretical assumptions
mathematics.

The algorithm needs all the training data and they are used.
in the testing phase. This makes training faster and the phase
testing that is slower and more expensive. The expensive part refers to the time required.
and memory. In the worst case, K nearest neighbors needs more
time to scan all data points and scan all points
data will require more memory to store training data.

Result after executing the previous code:

As can be seen here, you simply need to define the algorithm,

train the model with the training data and make a prediction.

SUPPORT VECTOR MACHINES

Support vector machines seek the line that best separates
two classes. The data features that are closer to the line than
better separate the classes are called support vectors and influence in the
location of the line. Of particular importance is the use of different
kernel functions through the kernel parameter.

Result once the previous code has been executed:

For this algorithm, similarly, the model must be defined and trained.
and make a prediction.

NAIVE BAYES
Naive Bayes is a statistical classification technique based on the
Bayes' theorem. It is one of the most used Machine Learning algorithms.
simple. It is a fast, accurate, and reliable algorithm, and it even has a high
precision and speed in large datasets.
Result once the previous code is executed:

Just like the previous cases, for this algorithm, the algorithm is defined
to implement, then it is trained using the data from
training and finally a prediction is made, either with the data
test or with new data.

DECISION TREES CLASSIFICATION

Decision tree is a type of supervised learning algorithm that
it is mainly used in classification problems, although it works
for categorical input and output variables as continuous.

In this technique, the data is divided into two or more homogeneous sets.
based on the most significant differentiator in the input variables. The
decision tree identifies the most significant variable and its value that
provides the best homogeneous population sets. All the
input variables and all possible split points are evaluated and are
choose the one that has the best result.

Tree-based learning algorithms are considered one of the

best and most used supervised learning methods. The methods
based on trees enhance predictive models with high accuracy,
stability and ease of interpretation. Unlike the models
linear, map quite well non-linear relationships.

Result once the above code is executed:

To implement this algorithm, it is only necessary to define it, perform its

respective training along with the training data and we carry out
a prediction.

RANDOM FORESTS CLASSIFICATION

Random forests classification is a versatile method of Machine
Learning. Implements dimensionality reduction methods, handles values
missing values, outliers, and other essential data exploration steps, and
does a pretty good job. It's a type of learning method for
sets, where a group of weak models are combined to form a
powerful model.

In random forests, several trees are cultivated instead of just one.

tree. To classify a new object based on attributes, each tree gives a
classification and it is said that the tree "votes" for that class. The forest chooses the
ranking with the most votes, among all the trees in the forest.

Result once the above code has been executed:

This algorithm is no different from the previous ones, here the algorithm is defined,
the model is trained and the respective prediction is made.

All algorithms can be configured by defining certain parameters.

to improve the performance of the model. All these parameters are
are defined in the scikit-learn library.
Chapter 10
PERFORMANCE METRICS ALGORITHMS OF
CLASSIFICATION

For machine learning classification problems, there are

a large number of metrics that can be used to evaluate the
predictions of these problems. This section will explain how
implement several of these metrics.

In this chapter you will learn:

Confusion matrix
2. Classification report
3. Area under the curve

CONFUSION MATRIX
The confusion matrix is one of the most intuitive and straightforward metrics.
what is used to find the precision and accuracy of the model. It is used
for the classification problem where the output can be either two or more
types of classes.

Identified the problem, the confusion matrix is a table with two

current
dimensions. The rows of the matrix indicate the observed or real class and the
Columns indicate the predicted classes.

It should be clarified that the confusion matrix itself is not a measure.

of performance as such, but almost all performance metrics are
they are based on her and on the numbers she contains.

Below is an example of the calculation of a matrix of

confusion for the dataset corresponding to diabetes of the
Pima Indians using Logistic Regression algorithm.

Result once the previous code is executed:

The confusion matrix shows us that the majority of the

predictions fall on the main diagonal of the matrix, these being
correct predictions.

For more information, you can check the following

information:

Invalid input. Please provide text for translation.

CLASSIFICATION REPORT
The scikit-learn library provides a very convenient report when
work is done on classification problems, this gives a quick idea of the
precision of a model using a series of measures. The function
classification_report() shows precision, sensitivity, F1 score
and the support for each class.

The following example shows the implementation of the function in

a problem.
Result once the previous code has been executed:

For more information, you can check the following

information

The provided text is a URL and does not contain translatable content.
Invalid input. Please provide a text to translate.

AREA UNDER THE CURVE

When it comes to a classification problem, one can rely on a
AUC-ROC curve, to measure performance. This is one of the metrics of
most important evaluation to verify the performance of any model
of classification. ROC comes from the characteristics of operation of the
receptor and AUC of the area under the curve.
Result once the previous code has been executed:

The obtained value is relatively close to 1 and greater than 0.5, which
It suggests that much of the predictions made have been correct.

For more information, you can check the following

information:

Invalid input. Please provide the text you want to be translated.

Invalid input. Please provide text to translate.
Chapter 11

REGRESSION ALGORITHMS

Point checking is a way to discover which algorithms

they work well on Machine Learning problems. It is difficult to know with
previously what algorithm is the most suitable to use, so it should be
check various methods and focus on those that demonstrate
to be the most promising. This chapter will explain how to implement
regression algorithms using Python and scikit-learn for problems
of regression.

In this chapter you will learn how to implement the following algorithms:

1. Linear regression

2. Polynomial regression

3. Supportvectorregression

4. Regression decision trees

5. Random forests regression

For this analysis, the advertising dataset will be taken that you can
find on the Kaggle page as "Advertising Data". Likewise
It is assumed that the theoretical part of each algorithm is known.
Machine Learning and how to use it will not explain the basics or the
parameterization of each algorithm.

LINEAR REGRESSION

Linear regression is a parametric technique used for prediction.

continuous variables, dependent, given a set of variables
independent. It is parametrically based because it makes certain
assumptions based on the dataset. If the dataset follows
those assumptions, the regression yields incredible results, otherwise,
it struggles to provide compelling accuracy.

Result once the previous code is executed:

As can be seen here, it is simply necessary to define the algorithm,

train the model with the training data and perform a
prediction.

For more information, you can check the following information:

http://bit.ly/2RpwDGK

Invalid input. Please provide text for translation.

POLYNOMIAL REGRESSION

Polynomial regression is a special case of linear regression, it extends

the linear model by adding additional predictors, obtained by raising each
one of the original predictors to a power. The standard method for
extend linear regression to a nonlinear relationship between the variables
dependent and independent, has been to replace the linear model with a
polynomial function.

Result after running the above code:

For this algorithm, an additional step must be taken before defining the
model, first of all, the degree of the polynomial must be defined and
transform the data corresponding to X. Having done all this now yes it
you can define the model, train it, and finally make a prediction.

For more information, you can review the following information:

Invalid input, please provide text for translation.

SUPPORT VECTORS REGRESSION

Support vector regression uses the same principles as those of

classification, with only a few minor differences. First of all, given
that the output is a real number, it becomes very difficult to predict the
available information, which has infinite possibilities, however, the
the main idea is always the same: minimize the error, individualize the
hyperplane that maximizes the margin, taking into account that some part is tolerated
of the error.

Result after executing the above code:

For this algorithm, similarly, the model must be defined, trained and
make a prediction. For this dataset, it is carried out beforehand,
a scaling in the data since the model was not entirely effective with
the original data.

For more information, you can review the following information:

Unable to access the provided link.

The provided text is a URL and does not contain translatable content.

REGRESSION DECISION TREES

Decision trees are a supervised learning technique that

predict response values through decision rule learning
derivatives of characteristics.

Decision trees work by splitting the feature space.

in several simple rectangular regions, divided by parallel divisions
of axes. To obtain a prediction for a particular observation, you
use the mean or mode of the responses from the observations of
training, within the partition to which the new one belongs
observation.

Result after running the above code:

Just like the previous cases, for this algorithm the algorithm is defined as
implement, then it is trained using the data from
training and finally a prediction is made, either with the data
for testing or with new data.

For more information, you can review the following information:

The provided text is a URL and does not contain translatable content.

The provided text is a URL and cannot be translated.

RANDOM FOREST REGRESSION

Random forests is a supervised learning algorithm that,

As can already be seen in its name, it creates a forest and does so in some way.
random way. To put it simply: the random forest creates
multiple decision trees and combines them to obtain a prediction
more accurate and stable. In general, the more trees in the forest,
see, the forest is more robust.

In this algorithm, additional randomness is added to the model, while

grow the trees, instead of looking for the most important characteristic at
split a node, find the best feature among a subset
random features. This results in a wide diversity
which generally results in a better model.

Result once the previous code is executed:

This algorithm is no different from the previous ones; here the algorithm is defined,
train the model and make the respective prediction.

For more information, you can review the following information:

Invalid input. Please provide text for translation.YoA

Invalid input. Please provide text for translation.

Remember that this algorithm, like the previous ones, can be

configure by defining certain parameters to improve the performance of the
model. All these parameters are defined in the library.
scikit-learn.
Chapter 12

PERFORMANCE METRICS FOR REGRESSION ALGORITHMS

In this chapter, the most common metrics for evaluation will be reviewed.
predictions about Machine Learning regression problems.

In this chapter you will learn:

1. Mean Squared Error

2. Mean Absolute Error

ROOT MEAN SQUARE ERROR (RMSE)

The most commonly used metric for regression tasks is the

mean squared error and represents the square root of the distance
mean squared error between the actual value and the predicted value.

Indicate the absolute fit of the model to the data, how close the points are.
of observed data from the predicted values of the model. The error
Mean square error or RMSE is an absolute measure of fit.

Result once the previous code is executed:

The best value for this parameter is 0.0.

For more information, you can review the following information:

Invalid input.

MEAN ABSOLUTE ERROR (MAE)

The mean absolute error is the average of the absolute difference between the
predicted values and the observed value. It is a linear score, which
means that all individual differences are weighted equally in the
average.

Result after executing the above code:

For this parameter, a value of 0.0 indicates that there is no error, that is,
predictions are perfect.

For more information, you can review the following information:

http://bit.ly/2TzzKNY

indicates the goodness or suitability of the model, often used for purposes
descriptive and shows that the independent variables as well
selected explain the variability in their dependent variables.

Result after executing the previous code:

The best possible score is 1.0 and it can be negative, because the model
it can be arbitrarily worse. A constant model that always predicts
the expected value of 'y', without taking into account the input characteristics,
you will receive a score of 0.0.

For more information, you can review the following information:

URL provided does not contain translatable text.

Downloaded from: www.detodopython.com

Chapter 13
MACHINE LEARNING PROJECT - CLASSIFICATION
Downloaded from: www.detodopython.com
In this chapter, a classification project will be developed using
Python includes each step of the Machine Learning process applied to
problem.

In this chapter, you will learn:

1. How to work through a classification problem

2. How to use data transformations to improve the
model performance
3. How to implement Machine Learning algorithms and
compare their results

DEFINITION OF THE PROBLEM

For the project, the same dataset that has been used will be utilized.
working through each of the chapters, but this time it will be developed from
complete manner. The dataset corresponds to the Indians
Pima.

This set describes the medical records of the Pima Indians and whether each
the patient will develop diabetes within five years. As such, it is
a classification problem.

The dataset can be found on the Kaggle page as

Pima Indians Diabetes Database

IMPORT THE LIBRARIES

The first step in any Machine Learning project is to
import the libraries to be used. It is normal that at first it is not known
all the libraries to be used and that are being added little by little, but by
At least one should start with the basics that are usually used in
implement any Machine Learning project.

Importing libraries can be done as needed.

programming, what I recommend is to include all these lines of
codes at the beginning of the program to easily know the modules
that are being used within the program, as well as maintaining a
order in programming.

For now, the Pandas and NumPy libraries will be imported.

matplotlib.

LOAD THE DATASET

Once the libraries have been imported, they can be used inside.
of the program, for this the dataset is loaded.

The dataset must be downloaded directly from the page of

Download Kaggle and save it on the computer in a specific folder where
the dataset file will be along with the program file
Python that is being developed, in this way it will be easier to
programming.

Remember that to download the files from Kaggle you must be

subscribed on the page, this is completely free and you will have access to your
available a large number of datasets with which you will be able to
practice later.

The exact name of the file where it is located must be placed

dataset. It is recommended to save the data in the same folder in
where the program file is located this way it is kept a
order.
UNDERSTANDING THE DATA
The next step in the development of a Machine Learning project is
the understanding of the data at hand. For this step, several are used.
functions available in the Pandas library.

Result after executing the above code:

Result after running the previous code:

Result once the above code is executed:

Result after executing the previous code:

Result once the previous code is executed:

The dataset consists of 9 columns, of which 8 are the
independent variables and the last column corresponds to the variable
dependent. Additionally, all data is numerical, between integers and
floating.

Additionally, the data is balanced so it is not

no additional steps are necessary.

VISUALIZING THE DATA

Once the data has been understood numerically, now it
will be analyzed visually using the matplotlib library. This library was
imported at the beginning of the program so it is not necessary to do it
again.
Result once the previous code is executed:

Result after executing the previous code:

Result once the above code is executed:

As we can see in the obtained graphs, the results are

very similar to those obtained in the previous point.

DATA SEPARATION
Knowing the dataset that is being worked on within the
project, the data is separated into two sets, the
first set of independent variables, X, and a second set of variables
dependent, and.

The data corresponding to the variables is separated.

independent, which would be all columns except the last one.

Result after executing the above code:

Next, the corresponding variables are defined, or variable.

dependent, which would be the last column of the dataset.

Result after executing the previous code:

SELECTING FEATURES
As observed, variable X has 8 columns, for this case it
the procedure will be applied to select the 5 characteristics that have
greater influence on the dependent variable. For this example, the
filter method.

To implement the respective functions, the must be imported.

corresponding libraries. It is recommended to place these lines of code at
beginning of the program along with the other libraries with which it is
working. This makes the program much cleaner and easier to
to understand for other people.

Once this is done, the respective code can be implemented.

Result after executing the previous code:

The 5 characteristics that have the greatest impact on the variable

Dependents are as follows:

Column 0 – pregnancies
Column 1 - glucose
Column 4 – insulin
Column 5 - bmi
Column 7 – age

Therefore, the variables X are now converted with only these 5

columns.

Result once the previous code has been executed:

DATA PROCESSING
Having defined the values of X that will be used in the algorithms, now
proceed to carry out the respective processing of the data.

For this case, only the data will be standardized, since they
they are at different scales and this can cause errors in the
analysis.

Before applying the function, the respective library must be imported.

Remember to place this line of code at the beginning of the program.

Now the data standardization is being carried out, it is important to mention

that this procedure is only carried out on the data corresponding to X
only.

Result after executing the previous code:

DATA SEPARATION
We reach the last step before implementing the Machine algorithms
Learning, this corresponds to separating the data into training.
And test, for this the train_test_split function from the scikit-learn library is used.
remember that it must be imported before using it.
Only 25% of the dataset will be used as data for
test, so the test size is set to 0.25.

APPLICATION OF CLASSIFICATION ALGORITHMS

The algorithms explained previously will be implemented and will
will evaluate the results concerning the error of each one of them.

Logistic Regression

The first algorithm to evaluate is the most basic of all and it is the
Logistic Regression.

Result once the previous code is executed:

The previous data is the representation of the test data.

originals, with the data obtained from the model. Next, a
the respective analysis with the corresponding metrics for the algorithms of
classification.

The first metric to evaluate will be the confusion matrix, for this
first the library will be imported and later the function.

Result once the previous code has been executed:

The results obtained here are quite satisfactory, as the

the model was able to obtain some good results.

Now the results are evaluated, but this time using the report from
classification available in the scikit-learn library.

Result after running the above code:

This report shows that the accuracy percentage,

sensitivity and F1 score of the model hover around 0.8, a number that is
very good.

KNearest Neighbors
The following algorithm to be evaluated will be the K nearest neighbors algorithm.
the same procedure will be carried out as was done previously.

Result after executing the previous code:

After completing this procedure, the model continues to be evaluated.

For this, the two metrics that were used with the previous one are applied.
firstly, the confusion matrix.

Result after running the above code:

And subsequently, the classification report is obtained.

Result once the previous code is executed:

As can be seen, the results obtained here are a bit
better than those obtained with the previous algorithm.

Support Vector Machines

Now the support vector machine algorithm will be evaluated,

the procedure is very similar to the other evaluated algorithms
previously.

Result once the above code is executed:

As in the previous cases, the obtained model is evaluated by

mean of the confusion matrix and the classification report.

Result after executing the above code:

Result after executing the previous code:

Naive Bayes
Downloaded from: www.detodopython.com
The following algorithm to be evaluated will be the naive bayes or how it is
know in Spanish, the naive bayesian.

Result once the previous code is executed:

Once the model is defined, its error is checked, for this purpose the
function of the confusion matrix and the classification report.
Result once the previous code has been executed:

Result once the above code is executed:

Classification Decision Trees

To evaluate the problem by implementing the tree algorithm

Classification decision is carried out through the following procedure.

Result once the previous code is executed:

The error is evaluated by implementing the confusion matrix and obtaining the
classification report.

Result after executing the previous code:

Result once the above code is executed:

Random Forest Classification

Finally, the last explained algorithm, random forests, is evaluated.

classification.

Result once the previous code is executed:

The functions to evaluate the algorithm are implemented below.
just like before, the confusion matrix and the report are used
classification.

Result after executing the previous code:

Result once the previous code is executed:

As can be seen, the evaluated algorithms obtained some

very similar results to each other.

In these cases, the algorithm that is much faster is selected.

easy to implement, which would be Logistic Regression.

Not all cases are like this; sometimes when working with
a classification problem, the results, after implementing several
algorithms, are different from each other, so here the selection is made
algorithm that obtains the best results.
Chapter 14
MACHINE LEARNING PROJECT - REGRESSION

In this chapter, a regression project will be developed using Python.

Each step of the Machine Learning process applied to the problem is included.

In this chapter you will learn:

1. How to work through a regression problem

2. How to use data transformations to improve the
model performance
3. How to implement Machine Learning algorithms and
compare your results

DEFINITION OF THE PROBLEM

The same dataset that was used will be utilized for the project.
to explain the regression algorithms, but this time it will be developed from
complete manner. The dataset corresponds to the ads
of advertising.

The dataset can be found on the Kaggle page as

Advertising Data

IMPORT THE LIBRARIES

The first step in any Machine Learning project is to
import the libraries to be used. It is normal that at first it is not known
all the libraries to be used and that will be added little by little, but for
At least you should start with the basics that are normally used in
implement any Machine Learning project.
Importing libraries can be done as needed.
programming, what I recommend is that all these lines are placed in
codes at the beginning of the program so that it is easy to know the modules
that are being used within the program, as well as maintaining a
order in programming.

For now, the Pandas and matplotlib libraries will be imported.

LOAD THE DATASET

Once the libraries have been imported, they can be used inside
from the program, to do this, the dataset is loaded.

The dataset must be downloaded directly from the page of

Kaggle and save it on the computer in a specific folder where
the dataset file will be alongside the program file
Python that is being developed, in this way it will make it easier to
programming.

Remember that to download the files from Kaggle you must be

subscribed on the page, this is completely free and you will have access to your
a large number of datasets that you will be able to
practice later.

The exact name of the file where it is located must be placed.

dataset. It is recommended to save the data in the same folder in
where the program file is located this way it is maintained a
order.

UNDERSTANDING THE DATA

The next step in the development of a Machine Learning project is
understanding the data available. Several are used for this step
functions available in the scikit-learn library.

Result once the previous code is executed:

Result once the previous code has been executed:

Result after executing the above code:

The dataset contains 5 columns, of which 4 are the

independent variables and the last column corresponds to the variable
dependent.

Detailing the data, it can be observed that the first column is of

numbering so that it can be discarded when the separation is made of
the data.

It can be observed that all the programming code used until

here is exactly the same as the classification problem developed in the
previous chapter.

VISUALIZING THE DATA

Once the data has been understood numerically, now we
will be analyzed visually using the matplotlib library. This library was
imported at the beginning of the program so it is not necessary to do so
again.

Result after executing the previous code:

Result once the above code is executed:

Result after executing the previous code:

As we can see in the obtained graphs, the results are
very similar to those obtained in the previous point.

It can be observed that all the programming code used up to

here is exactly the same as the classification problem developed in the
previous chapter.

DATA SEPARATION
Knowing the dataset that is being worked on within the
project, the separation of the data into two sets is carried out, the
first set of independent variables, X, and a second set of variables
dependents, and.

We start by separating the data corresponding to the variables.

independent, which would be all the columns except the first one and
the last one.

The first column is removed because it contains the numbering of the

rows, information that does not influence the analysis of Machine Learning.

Result after executing the above code:

Next, the corresponding variables ay are defined, or variable.
dependent, which would be the last column of the dataset.

Result after executing the previous code:

Downloaded from: www.detodopython.com

For this problem, feature selection is no longer necessary.
which has very few, so this step will be skipped.

Similarly, the data is suitable for use.

within Machine Learning algorithms, so it is not necessary
carry out a processing on them.

DATA SEPARATION
We have reached the final step before implementing the Machine algorithms.
Learning, this corresponds to separating the data into training
and test, for this the train_test_split function from the scikit- library is used.
learn, remember that it must be imported before using it.
Only 25% of the dataset will be used as data for
test, which is set at the test size of 0.25.

APPLICATION OF REGRESSION ALGORITHMS

The algorithms explained earlier will be implemented and
will evaluate the results concerning the error of each of them.

Linear Regression

The first algorithm to evaluate is the most basic of all and is that of
Linear Regression.

Result after executing the above code:

The previous data is the representation of the test data,

original, with the data obtained by the model. Next, it is carried out
the respective analysis with the corresponding metrics for the algorithms of
regression.

The first metric to evaluate will be the mean squared error, for this...
First, the library will be imported, and subsequently the function.

Result once the above code is executed:

Downloaded from: www.detodopython.com

The results obtained here are quite satisfactory, I remember that if
The value closer to 0 indicates that the model is good.

Now the results are evaluated, but this time using the metric. ,
importing the function in scikit-learn first to then implement it.

Result after executing the previous code:

This metric, unlike the previous one, the closer the result is to
1 is a very good model, so the result obtained indicates that the
the developed model is good.

Polynomial Regression

The following algorithm to be evaluated will be polynomial regression, here we

it will carry out the same procedure that was performed earlier.
Result once the above code is executed:

After completing all this procedure, the model continues to be evaluated.

For this, the two metrics that were used with the previous one are applied.
firstly, the mean squared error.

Result after executing the above code:

And subsequently, the metric is obtained.

Result once the previous code has been executed:

As can be seen, the results obtained here are very similar.
to the previous model.

Support Vector Regression

Now the support vector regression algorithm is being evaluated,

the procedure is very similar to the other evaluated algorithms
previously.

Result once the previous code is executed:

Just like in the previous cases, the obtained model is evaluated by

mean of the mean squared error and the metric.

Result once the previous code is executed:

Result once the above code is executed:

If we observe the results of the evaluation metrics, they are not for
nothing good, so it can be inferred that this is not the best algorithm
for this dataset.

Decision Trees Regression

The following algorithm to be evaluated will be regression decision trees.

Result once the previous code is executed:

Once the model is defined, its error is verified, for which the
function of mean squared error and the metric.

Result after executing the previous code:

Result after executing the above code:

Random Forest Regression

To evaluate the problem by implementing the forest algorithm

random regression the following procedure is carried out.

Result after executing the previous code:

The error is evaluated by implementing the mean squared error and the metric.
.

Result once the above code is executed:

Result after executing the above code:

As can be seen, the evaluated algorithms obtained some

results very similar to each other, except for the one about vectors of
regression support. This does not mean that this algorithm is bad, but rather that
it is simply not the most suitable for this dataset.

Any algorithm that achieved good results can be chosen.

result, in case a fast and easy algorithm is required
you can choose the Linear Regression one or the Regression
Polynomial.
Chapter 15
CONTINUE LEARNING MACHINE LEARNING

In this chapter, you will find areas where you can practice the new
acquired skills in Machine Learning with Python.

In this chapter you will learn:

Find datasets to work on new ones

projects

BUILD TEMPLATES FOR PROJECTS

Throughout the book, it has been explained how to develop projects of
Machine Learning using Python. What is explained here is the foundation that can
to be used to start new Machine Learning projects. This is
just a start, and you can improve as you develop
bigger and more complicated projects.

As you apply what is explained here and implement the skills

acquired in Machine Learning using the Python platform,
you will develop experience and skills with new and different techniques with
Python, making it easier to develop projects within this area.

DATA SETS FOR PRACTICE

The important thing to improve the skills learned from the book
is to keep practicing. On the web, you can find several datasets
what you can use to practice knowledge and, at the same time, to create
a project portfolio to showcase within the resume.

The first place where you can find datasets that you can
using in Machine Learning projects is in the Machine repository
Learning from UCI (University of California, Irvine). The sets of
The data from this repository is standardized, relatively clean, well
understood and excellent for use as datasets for
practice.

With this repository, it is possible to build and further develop the

skills in the area of Machine Learning. It is also useful for getting started
create a portfolio that can be shown to future employers
demonstrating that it can be capable of delivering results in projects of
Machine Learning using Python.

The link for this repository is the next:

http://archive.ics.uci.edu/ml/index.php.

Another page that has very good datasets for

practice is Kaggle. Here, you can download and practice your skills
learned, but you can also participate in competitions. In a
competition, the organizer provides you with a set of data from
training, a test dataset in which you must do
predictions, a performance measure, and a deadline. Then the
competitors work to create the most accurate model possible. The
Winners often receive cash prizes.

These competitions often last weeks or months and can be very

fun. They also offer a great opportunity to test your
skills with Machine Learning tools with datasets
that often require a lot of cleaning and preparation. A good place to
starting would be the competences for beginners, as they often are
less challenging and have a lot of help in the form of tutorials for
start.

The main website to find the datasets is the

next:Unable to translate URL content..

No matter which page is used to obtain the datasets,

the steps to follow to improve Machine skills
Learning are the following:
1. Browse the list of free datasets on the
repository and download some that seem interesting.
2. Use the libraries and functions learned in this book to
work through the dataset and develop a model
I need.

3. Write the workflow and the conclusions obtained for

that can be consulted later and even if possible
share this information on some website, a place
recommended is on Github.

Working on small projects is a good way to practice the

fundamentals. Sometimes some problems will become easier because of the
one should seek new challenges to leave the comfort zone to
help increase skills in Machine Learning.
Chapter 16
GET MORE INFORMATION
Downloaded from: www.detodopython.com
This book is just the beginning of your journey to learn Machine
Learning with Python. As new projects are developed, it is
You may need help. This chapter indicates some of the
best sources of help for Python and Machine Learning that you can
to find.

In this chapter you will learn:

1. Useful websites to clear up doubts.

GENERAL COUNCIL
In general, on the LigdiGonzalez page you will find much more
information about Machine Learning. From theoretical information like
practical exercises, this way you can enhance your skills within
on this topic. All the information contained here is in Spanish.

For more information, you can review the following

information

Invalid input. Please provide a text for translation.

Similarly, the official documentation for Python and SciPy is

excellent. Both the user guides and the API documentation are
an excellent help to clarify doubts. Here you will have an understanding
most complete of the deepest configuration you can explore. The
The published information is in English.

Another very useful resource is question and answer sites, such as

StackOverflow. You can search for error messages and problems that
have and find examples of codes and ideas that can help in the
projects. This page is available in both Spanish and English,
although the largest amount of information is found in this last one.

HELP WITH PYTHON

Python is a multi-purpose programming language. As
the more you learn about it, the better you will be able to use it. If it is relatively
new to the Python platform, here are some valuable resources to go
a deeper step:

Official Python 3 Documentation:

The provided text is a URL and does not contain translatable text.

HELP WITH SCIPY AND NUMPY

It is a good idea to familiarize yourself with the broader SciPy ecosystem.
therefore it is advisable to review the notes of SciPy and the documentation
of NumPy, especially when there are problems with these libraries.

SciPy Notes:
http://scipy-lectures.org/

Official NumPy documentation:

Invalid input. Please provide text for translation.

HELP WITH MATPLOTLIB

Graphically displaying data is very important in Machine
Learning, so you can review the official information of matplotlib,
where many examples can be found, with their respective codes, that
they can be quite useful to implement in personal projects.
Official matplotlib documentation:
Invalid input; URL cannot be translated.

HELP WITH PANDAS

Pandas has a wealth of documentation. The examples
presented in the official documentation are very useful as they will give ideas
about different ways to split and cut the data.

Official Pandas Documentation:

http://pandas.pydata.org/pandas-docs/stable/

HELP WITH SCIKIT-LEARN

The documentation published on the scikit-learn website is of great
help at the time of developing Machine Learning projects, review the
configuring each function can help improve projects and
get better results.

Official scikit-learn documentation:

The provided text is a URL and does not contain translatable content.
Machine Learning with Python
Downloaded from: www.detodopython.com
Supervised Learning

Ligdi González

2019
EVERYTHINGPROGRAMMING.ORG

Material for lovers of the

Java programming
C/C++/C#,Visual.Net, SQL,
Python, Javascript, Oracle
Algorithms, CSS, Development
Web, Joomla, jquery, Ajax and
Much More...
VISIT

www.detodoprogramacion.org
www.detodopython.com
www.freecode.com

Unit 1-1
No ratings yet
Unit 1-1
10 pages
ML LAB Manual
No ratings yet
ML LAB Manual
24 pages
Unit 1
No ratings yet
Unit 1
62 pages
Uttam
No ratings yet
Uttam
29 pages
Introduction To Machine Learning With Python A Guide For Beginners in Data Science 9781724417503 1724417509
100% (3)
Introduction To Machine Learning With Python A Guide For Beginners in Data Science 9781724417503 1724417509
176 pages
ML LAB Record
No ratings yet
ML LAB Record
51 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
BCS 402 Lesson 5
No ratings yet
BCS 402 Lesson 5
16 pages
Machine Learningusing Python
No ratings yet
Machine Learningusing Python
18 pages
Data Science
No ratings yet
Data Science
17 pages
Core Libraries For Machine Learning
No ratings yet
Core Libraries For Machine Learning
5 pages
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
No ratings yet
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
1 page
Chapter 6 Python Libraries For Machine Learning
No ratings yet
Chapter 6 Python Libraries For Machine Learning
21 pages
Machine Learning Crash Course For BCA 5th Semester
No ratings yet
Machine Learning Crash Course For BCA 5th Semester
21 pages
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
No ratings yet
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
30 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
PPT-Final Project - DT - Done All Final
No ratings yet
PPT-Final Project - DT - Done All Final
14 pages
Machine Learning With Data Science
No ratings yet
Machine Learning With Data Science
31 pages
Data Sets
No ratings yet
Data Sets
36 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Statistics and Machine Learning in Python
No ratings yet
Statistics and Machine Learning in Python
218 pages
Python - Follow Dr. AngShu (@drangshu) For More
100% (1)
Python - Follow Dr. AngShu (@drangshu) For More
300 pages
Statistics Machine Learning Python Draft
No ratings yet
Statistics Machine Learning Python Draft
329 pages
Statistics and Machine Learning Overview
No ratings yet
Statistics and Machine Learning Overview
319 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Statistics and Machine Learning in Python
No ratings yet
Statistics and Machine Learning in Python
300 pages
Analyzing Low Birth Weight Factors in R
100% (1)
Analyzing Low Birth Weight Factors in R
219 pages
Python for Statistics & ML Guide
No ratings yet
Python for Statistics & ML Guide
300 pages
Module 1 MMC201
No ratings yet
Module 1 MMC201
77 pages
PROBLEMS ADDITION SUBTRACTION UP TO 1000
No ratings yet
PROBLEMS ADDITION SUBTRACTION UP TO 1000
12 pages
Macro and Micro-location
No ratings yet
Macro and Micro-location
3 pages
TRIPARTITE DIVISION OF LAW
No ratings yet
TRIPARTITE DIVISION OF LAW
3 pages
ONEM Solution Manual 2016 F2N3
No ratings yet
ONEM Solution Manual 2016 F2N3
11 pages
Dynamics The Web
No ratings yet
Dynamics The Web
4 pages
Issue 1_Evaluation of Competencies in Primary Education
No ratings yet
Issue 1_Evaluation of Competencies in Primary Education
24 pages
5 Pre-Hellenic Greek Architecture
No ratings yet
5 Pre-Hellenic Greek Architecture
50 pages
e6-2022-the-boys-of-the-glaciers-corrected
No ratings yet
e6-2022-the-boys-of-the-glaciers-corrected
8 pages
Job Description Form
No ratings yet
Job Description Form
4 pages
Integrated Systems of Quality, Environment and Safety Management.pdf
No ratings yet
Integrated Systems of Quality, Environment and Safety Management.pdf
5 pages
Ft23 Direct and Inverse Proportionality
No ratings yet
Ft23 Direct and Inverse Proportionality
4 pages
Pa 1_ Public Health Policies
No ratings yet
Pa 1_ Public Health Policies
2 pages
Gaceta S077 LIST OF OFFICIAL SENDERS (EMAILS) THAT TELMEX USES TO COMMUNICATE WITH CLIENTS
No ratings yet
Gaceta S077 LIST OF OFFICIAL SENDERS (EMAILS) THAT TELMEX USES TO COMMUNICATE WITH CLIENTS
2 pages
From tequesquite to DNA.docx
No ratings yet
From tequesquite to DNA.docx
8 pages
1st GRADE_FIRST_TERM_MATHEMATICS
No ratings yet
1st GRADE_FIRST_TERM_MATHEMATICS
8 pages
Mind map development of Internal Audit.docx
No ratings yet
Mind map development of Internal Audit.docx
1 page
54d29891ad
No ratings yet
54d29891ad
65 pages
omnilife price list.pdf
No ratings yet
omnilife price list.pdf
2 pages
SUCCESSIONS LAW.pdf
No ratings yet
SUCCESSIONS LAW.pdf
17 pages
TALK 5 MINUTES - PRP006 HEARING PROTECTION
No ratings yet
TALK 5 MINUTES - PRP006 HEARING PROTECTION
1 page
SESSION No. 1 KNOWING REASONS AND PROPORTIONS
No ratings yet
SESSION No. 1 KNOWING REASONS AND PROPORTIONS
2 pages
INVIMA 2023DM-0027788 SONOSPACE MEDICAL CORP DIGITAL COLOR DOPPLER ULTRASOUND
No ratings yet
INVIMA 2023DM-0027788 SONOSPACE MEDICAL CORP DIGITAL COLOR DOPPLER ULTRASOUND
3 pages
Task No. 3
No ratings yet
Task No. 3
6 pages
Biography of Louis Armstrong
No ratings yet
Biography of Louis Armstrong
2 pages
Informatics
No ratings yet
Informatics
4 pages
Circuit breakers and motor protectors
No ratings yet
Circuit breakers and motor protectors
2 pages
SOLVED EXERCISE ON SOURCES AND USE OF FUNDS, practical example taken from the internet
No ratings yet
SOLVED EXERCISE ON SOURCES AND USE OF FUNDS, practical example taken from the internet
7 pages
12. GROWING THROUGH ADVERSITY.doc
No ratings yet
12. GROWING THROUGH ADVERSITY.doc
11 pages
Research Work Functional Patterns 7-8 Marjory Gordon
No ratings yet
Research Work Functional Patterns 7-8 Marjory Gordon
26 pages
Criminal Lawyer's Day
No ratings yet
Criminal Lawyer's Day
17 pages
Introduction to pandas Data Structures
No ratings yet
Introduction to pandas Data Structures
21 pages
Python Web Scraping Basics
No ratings yet
Python Web Scraping Basics
4 pages
Resume Building Workshop: ECE Department Placement Club
No ratings yet
Resume Building Workshop: ECE Department Placement Club
20 pages
Opseestoolsapythonlibrarytostreamline Open Sees Pyworkflows
No ratings yet
Opseestoolsapythonlibrarytostreamline Open Sees Pyworkflows
8 pages
Okay
No ratings yet
Okay
44 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
Extract Age from Titanic DataFrame
No ratings yet
Extract Age from Titanic DataFrame
70 pages
Python For Data Science FNL
No ratings yet
Python For Data Science FNL
6 pages
Internship Task - AgenticAI
No ratings yet
Internship Task - AgenticAI
3 pages
Day-18 (Pandas in Python)
No ratings yet
Day-18 (Pandas in Python)
23 pages
Diploma in Data Analyst Course Overview
No ratings yet
Diploma in Data Analyst Course Overview
2 pages
B. SC - Data Science
No ratings yet
B. SC - Data Science
50 pages
Pandas
No ratings yet
Pandas
82 pages
Half Yearly Class12 Answer
No ratings yet
Half Yearly Class12 Answer
9 pages
Contextual Based Product Description
No ratings yet
Contextual Based Product Description
2 pages
Informatics Pratices - Class 12 - Series - Practice Paper - 2024-25
No ratings yet
Informatics Pratices - Class 12 - Series - Practice Paper - 2024-25
2 pages
Cheat Sheet Data Preprocessing Tasks in Pandas
100% (1)
Cheat Sheet Data Preprocessing Tasks in Pandas
2 pages
Answers For Intro To Python For Computer Science and Data Science Learning To Program With AI Big Data and The Cloud by Paul J Deitel Harvey M Deitel
No ratings yet
Answers For Intro To Python For Computer Science and Data Science Learning To Program With AI Big Data and The Cloud by Paul J Deitel Harvey M Deitel
339 pages
Tabby Fintech Study Plan
No ratings yet
Tabby Fintech Study Plan
2 pages
Python Programming 1 1723888682740
No ratings yet
Python Programming 1 1723888682740
5 pages
Srijan Resume 1
No ratings yet
Srijan Resume 1
1 page
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
Capgemini Data Analyst Interview Prep
No ratings yet
Capgemini Data Analyst Interview Prep
3 pages
NumPy Crash Course eBook
No ratings yet
NumPy Crash Course eBook
55 pages
Data Analyst Resume: Skills & Projects
No ratings yet
Data Analyst Resume: Skills & Projects
2 pages
IP Previous Year Sample Papers ++ Sample Paper Upto 2025
No ratings yet
IP Previous Year Sample Papers ++ Sample Paper Upto 2025
204 pages
Data Analysis Projects Portfolio
No ratings yet
Data Analysis Projects Portfolio
1 page
Ad3301 Dev Splitup
No ratings yet
Ad3301 Dev Splitup
5 pages