Computer Programming - 2 Books in 1 - Machine Learning For Beginners + Python For Beginners
Computer Programming - 2 Books in 1 - Machine Learning For Beginners + Python For Beginners
Beginners
2 Books in 1:
Jason Knox
1. Machine Learning for Beginners
Introduction
Chapter 1: What is Machine Learning – and the evolution of machines
A Quick History of Computer Science
The Evolution of Machines
The Evolution of Artificial Intelligence
Machine Learning
Training and Test Data
Machine Learning vs. Artificial Intelligence and Deep Learning
What Sectors and Industries Use Machine Learning
Government
Financial Services
Transportation and Shipment
Mining, Oil and Gas
Retail and online marketing
Healthcare
Chapter 2: Introduction to Data Science
What is Data Science?
Is Data Science Really a Thing?
The Role of Machine Learning in Data Science
The Tasks of Data Science
Predictive Analysis
Prescriptive Analytics
Pattern Recognition
Classification and Anomaly Detection
Examples of Data Science in Use
Data Science as a Career
Chapter 3: Supervised Machine Learning
Supervised Learning
Selecting an Algorithm
Bias Errors
Variance Errors
Noisy Data
Classification vs. Regression
Variations on Supervised Learning
Chapter 4:Unsupervised Learning
Clustering
K-Means Clustering
Mean Shift Clustering
Neural Networks
Markov Algorithm
Chapter 5: Reinforcement Learning
Types of Reinforcement Learning
The State Model of Reinforcement Learning
Short Term Rewards
When are Rewards Given
Episodes
Continuous Tasks
Q Tables
Chapter 6: Algorithms for Supervised Learning
Linear Regression
Logistic Regression
Decision Trees
Random Forest
Nearest Neighbor
Chapter 7: Tips for Your Work in Machine Learning
Understand the Difference Between Prediction and Classification
Knowing Which Type of Learning to Use
Data Selection
Tools to Use
Practice
Utilize Mixed Learning Models
Have Realistic Expectations
Chapter 8: The Future of Machine Learning
Conclusion
Introduction
Machines have been a central part of human life for centuries, if not eons.
The complexity of machines has varied, but in all cases machines have
extended the human mind and helped to automate tedious tasks, helping to
free up human beings to do other things.
Of course this hasn’t always been welcomed. In the 18th century, riots
ensued when machines automated the jobs of textile workers. A group
known as the Luddites destroyed many machines in factories out of fear
that they would lose their ability to gain employment, being outcompeted
by the machines. Ironically, 50 years later there were ten to thirty times
more jobs in those industries. The new job creation opened up because of
the increased productivity unleashed by the machines. Sadly, even though
the lessons of that experience are clear, modern day Luddites continually
worry about robots and artificial intelligence destroying all the jobs around
them.
Of course there will be challenges ahead, there always are. Workers will
have to obtain more extensive training than they did in decades past, as they
begin to work with more sophisticated machines that use statistical
algorithms to learn and perform better.
Machine learning is one of the most exciting and vibrant areas of research
in science and engineering today. In this book we are going to introduce
you to the world of machine learning, beginning with a discussion of how
machines and computers have evolved along with humans over our long
history.
We will also talk at length about data science, which is a field of growing
importance. Data science is now becoming one of the hottest career paths
around, and it’s used in multiple ways from Wall Street, to the Pentagon,
and in private firms like Amazon and Facebook (no surprise there). The
main focus of this book, however, will be on machine learning.
After this we will discuss the steps that are followed in machine learning,
including collecting data, data wrangling, analysis, training and testing
algorithms, and deployment. From here we’ll go over the main types of
machine learning, and then talk about algorithms.
Jason
Chapter 1: What is Machine Learning – and the
evolution of machines
In this chapter we will introduce you to the concept of machine learning and
learn how it fits in the broader realm of computer science and artificial
intelligence.
A Quick History of Computer Science
In the early days of computer science, a computer had to be given specific,
step-by-step instructions in order to perform a given task. At first, these
instructions were entered in the computer using punch cards that a computer
system could read. Computers use a binary language where 1 means yes
and 0 means no. It’s possible to build up complete streams of logic and
store and represent anything using only binary. That includes anything from
the stock market, to your basic information like name, age, and social
security number, to the pixels that make up an image.
There are two levels of higher level languages that were used in the early
decades of computer science after the war ended. The first level is called
assembly language. This is a lower level language that is still using a
thought process, if you can call it that, which a computer would use. It’s
difficult to understand and many people find it quite challenging to develop
large algorithms using assembly language. The kinds of steps that are
involved may include telling the computer to move a piece of data from one
memory location to another, or having it go through the individual steps to
multiply two numbers together. Assembly language is barely a step above
binary.
You are probably familiar with some of the many high level languages that
exist today. In the 1950s, FORTRAN was the king of high level languages,
and it’s still used in many scientific applications, such as simulating the
detonation of nuclear weapons. While it’s partly used in applications like
that because of legacy reasons, it’s also used because it’s an extremely good
language for doing calculations.
As the 20th century wore on, other computer languages were invented and
became more popular. These included Pascal, Ada, C, and C++. The
language C++ was an extension of c which introduced the concept of
object-oriented programming. Rather than simply designing algorithms,
object-oriented languages let programmers build objects inside their
programs, and the objects can be acted on or experience “events”, which
takes the idea of an algorithm as a step-by-step of instructions to a higher
plane.
Later, the invention of the internet and smart phones resulted in a wide
proliferation of new languages that were specifically developed for use in
certain contexts. For example, an extension of C used to program iPhones
was invented called objective-c. Later that was replaced by Swift. On the
internet and on Android phones we’ve seen the rise of Javascript and Java,
among many others too numerous to mention.
The Evolution of Machines
Human beings are problem solvers, and machines are an extension of our
natural thought processes. Whenever there is a seemingly impossible task,
humans “put their minds together” in order to plan and figure out a way to
accomplish the task. This is something that humans have always done. Eons
ago, people might have had to figure out how to cross a river. Then in 1969,
humans put their minds together to land two men on the moon.
The first human tools were just extensions of our hands and limbs, and
substitutes for the big canine teeth and claws that we lacked. These tools
included cutters, scrapers, and spear tips all made out of stone. Although
their purposes are relatively straightforward, they were quite revolutionary.
These were the first steps taken to extend the human mind by using
machines, or tools if you want to call them that. This was indeed a giant
leap, the designer of a tool has to envision it in their mind, and then carve
something out of a rock that had never existed before. By using knives and
spears, humans became powerful hunters with the ability to cut and tear at
their prey that was far superior to anything a lion or bear could muster with
their natural defenses.
As the eons passed, tool making slowly grew in sophistication. At first, the
progress was extremely slow. People began making baskets and other tools
to carry and store things they needed like food or olive oil. They also made
plows, which made it easier to plant crops. For centuries things basically
stayed the same, as if people were slightly beyond the stone age but kind of
trapped in it. But as civilization became more sophisticated, so did the tools
it was using.
The wheel was invented in the old world, which allowed people to transport
goods and people far more effectively. The Romans build roads, and then
created aqueducts to bring water from far away places. Pretty soon people
were making windmills, and using the power of water to make the first
machines in the sense that we understand the word.
Early machines went beyond using a simple tool, which requires direct
human application of labor. The first machines used simple levers and
pulleys, which are devices that distribute and magnify applied forces in
order to do work. Levers, pulleys, wedges and screws were all invented in
ancient Greece, and of course we still use them today.
Despite decades of research and rapid progress, the way the human brain
works remains somewhat clouded in mystery. Understanding consciousness
and how we learn things continues to push the boundary of science. Despite
this, amazing progress has been made in AI over the past fifty years, even if
the original promise of the HAL 9000 computer still seems years away.
At the core, what do humans do as they go through life? They learn from
their experiences. As you learn, you get better at doing things, in other
words your behavior adjusts to incorporate the data that you have
incorporated. You might say that your behaviors are algorithms, and the
algorithms changed.
This is also the core of artificial intelligence. That is the idea behind
artificial intelligence is to have machines that can learn from experience,
and adjust their algorithms appropriately.
Computer systems that are artificially intelligent are able to recognize
patterns in data, and so they can be trained by feeding them large amounts
of data.
Machine Learning
Earlier in the book when we discussed the evolution of computer systems,
we noted that each step in an algorithm had to be programmed by a human
programmer, and in those days this was done using punch cards to represent
1’s and 0’s that the computer would use to carry out the tasks. When you
program in that fashion it’s tedious, and the computer acts as a passive
receptacle, merely carrying out the steps that you feed it to carry out various
tasks, and produce pre-determined answers that are dependent on coded
rules. But what if there was another way to use computers?
This is where machine learning comes in. The concept behind machine
learning is simple (to describe). You develop a computer system that is able
to learn from data it’s exposed to, improve it’s performance, and make
decisions without being explicitly programmed to do so. It learns from data
by recognizing patterns in the data and it does this automatically without
needing human intervention. Patterns in data can be detected using
statistical modeling. The more data it’s exposed to, the better it gets at its
job. This is the concept of “self-learning”.
The algorithms that we’ve described earlier that were programmed directly
a human programmer, determining every single step along the way,
illustrated a very different approach to computing . Traditional computer
programming involves giving the computer direct commands developed by
a human or groups of humans.
The algorithmic models look for relationships and patterns in the data in
order to make predictions about the outputs. The data itself will configure
the structure of the model rather than having a computer programmer do it.
The system has to be exposed to a large amount of data, otherwise its not
going to be able to accurately determine the underlying relationships in the
data and use that to make future predictions. Remember that in the real
world, there are always outliers that can throw off any pre-determined
outcome. In order for the system to learn well enough so that it’s not
making too many mistakes, it has to be exposed to enough data so that it
can also incorporate the outliers and unexpected results. Of course the
model is not going to be 100% accurate all the time, even a well trained
model is going to have misses.
Machine learning also has some fun applications. Big video game
companies are using machine learning algorithms to have their games learn
from the experience just as the player is doing so, and this can make the
games more challenging as the player advances to new levels.
Machine Learning vs. Artificial Intelligence and Deep Learning
Machine learning is actually a subset of Artificial intelligence. You
probably have an intuitive understanding of what artificial intelligence is.
You can think of a robot from a science fiction movie as your reference
point. That is, artificial intelligence is a system that has cognitive abilities
that would at the very least try to mimic the human mind, and possibly
exceed it, at least in certain tasks. These cognitive skills can include
problem solving abilities, and the ability to learn from examples. An
artificially intelligent system would have some ability to perceive its
environment, and then take actions to increase its probability of success. An
example of this would be a self-driving car, image processing, or a facial
recognition system.
So you can see fro this that machine learning is actually a subset of artificial
intelligence. Humans obviously learn from experience, and that is one of
the goals of AI, which is accomplished by machine learning.
There are three broad types of artificial intelligence. The first is general
artificial intelligence or general AI, which would classify a machine that
would be able to complete any task that a human could perform. You could
put the androids here.
Narrow AI is where you have a machine that would perform a specific task
better than a human can. This type of AI would be used to say create robots
that would load and unload boxes in a warehouse. They never get tired or
ask for breaks and work at a predictable pace. These types of robots already
exist.
Finally we have strong AI, where machines can perform a task better than a
person can. We already have computers doing this without AI, since they
can tally up numbers, multiply and divide them, and calculate averages and
standard deviations on data sets on an instant, when it would take a human
hours. However image processing applications and chess playing programs
are a good example.
Deep learning uses a more complicated neural network that has hidden
layers of nodes in between the input and output.
Therefore, machine learning is a subset of AI, and deep learning is a subset
of machine learning. The models in deep learning are more complicated
than the models in machine learning, but the basic concepts are the same.
What Sectors and Industries Use Machine Learning
Machine learning is used throughout the world of business and government.
The applications are wide ranging and varied, after all this is meant to be a
generalized method that can be used to apply computing power. We’ve
already touched on a couple of specific examples, now let’s get some idea
of how machine learning can be used in different sectors and segments of
society.
Government
As you might imagine, the government has a significant interest in machine
learning. The interest of government spans all levels, from local
governments through the Federal government. And of course
internationally, many of the world’s centralized governments are utilizing
machine learning for a wide variety of purposes, some good and some bad.
The military is utilizing machine learning for all of its operations. This will
help the military develop more efficient machines that can operate without
significant or any human input. Depending on the application, this can be
seen as a positive or as something that might raise ethical concerns.
As we’ve seen, Southwest Airlines and other companies like UPS have
used machine learning to identify issues impacting wasted time, fuel, and
money. The U.S. military might not seem that similar to Southwest Airlines
and UPS at first glance, however remember that in addition to fighting wars
the military is a logistics and transportation powerhouse. The military needs
to keep troops supplied in remote locations, and it needs to be able to move
heavy equipment from one point to another with absolute efficiency, as well
as transporting large numbers of people by air, sea, or over land. Machine
learning is being used to improve logistics and transportation in the military
in order to cut fuel costs, reduce idle time, and make the movement of
people and supplies more efficient.
The Centers for Disease Control is beginning to use machine learning for a
wide variety of purposes. For example, one proposal is to use machine
learning to study epidemics. This could help healthcare workers identify
epidemics faster, and then act when necessary in response. As it is now,
data is often de-centralized and not processed until after the fact, when
patterns of cases showing up at various hospitals could be used to detect
when an epidemic is in formation and predict it’s future course.
Another area which local, state, and federal governments are interested in is
using machine learning to predict the behavior of people that have been
arrested. As you might imagine, this is quite controversial and may run into
constitutional issues in the United States. The way it would work is it could
estimate, based on the data points associated with a given individual, the
likelihood that they would show up for a trial if they were released. The
system can allow people to be released on their own recognizance, set bail,
or even recommend that a suspect be held without bail. The system can also
analyze the data points associated with the subject and make estimates on
the chance of reoffending if they are convicted.
As you can see, the government is using machine learning in many other
contexts, and throughout many different departments. These applications
are not without controversy, but because of their power governments are
going to try and implement them. So expect to be hearing a lot about this
going forward.
Financial Services
It won’t surprise you to learn that banks, mortgage companies, and financial
service companies of all types are using machine learning to enhance their
operations and make them more efficient – from their perspective. The
applications of machine learning to financial services are seemingly
endless, but the first area where they are used extensively is with fraud
detection. This can be used to determine when it appears someone is setting
up a false account, is using an account for fraudulent activity, or is using
identity theft to obtain financial services. A simple example that may touch
some of our readers is use of a debit card in a manner that is consistent with
the patterns seen when cards are stolen. If you have experienced this,
having your card shut off even though you have it but are just running
errands and paying with your debit card, then you can see first hand the
pitfalls that come along when machine learning gets it wrong.
Another area that machine learning has taken over when it comes to
financial services is credit checks and loan approval. The days of having to
speak to a loan officer are rapidly disappearing, and in some cases are
already gone. This is actually one of the most straightforward applications
of machine learning, the algorithm is easily trained on the reams of data that
have been collected on the past from the financial behaviors of tens of
millions of people. This can teach the algorithm quite well and have it
predict who among today’s applicants is likely to default on their loan. Of
course as we mentioned earlier, these types of processes are never going to
be perfect and so mistakes are bound to happen.
Transportation and Shipment
We’ve already noted that Southwest and other airlines have been able to use
machine learning to allocate their resources more effectively and to avoid
wasting fuel and time unnecessarily. The same techniques used by the
airlines can be used in any application involving mass transit, or with
hauling and shipping. For example, machine learning can help a trucking
company use it’s resources in the most efficient way possible to cut fuel
costs and reduce idle time among drivers. Machine learning makes shipping
more effective, by being able to determine how to sort goods for shipping
and what goods should be loaded on what ships, working autonomously and
more efficiently than a human. We also face the possibility that at some
point there will be self-driving trucks, railroads, and self-propelled/directed
ships that travel back and forth to their destinations without human
interference. Of course there are many legal and ethical issues that come
along with these applications, and so it’s not clear how much the potential
applications will see actual use on a large scale.
Mining, Oil and Gas
Machine learning has immediate application to the oil and gas industry, and
also to mining. One area where it can be used is in the search for new
deposits or energy sources. This can make companies far more efficient
when they seek out areas for experimental drilling. Using machine learning,
they can quickly filter out areas that are likely to be less promising. A large
part of the mining, oil, and gas industries is transportation and storage.
Once again, the ability of machine learning to help make transportation and
logistics more efficient is able to help these industries a great deal. Machine
learning can also be used for more efficient distribution and maximizing the
efficiency of storage. These applications are not without risk, it’s possible
that a human geologists would spot an area for experimental drilling that a
system based on machine learning would miss, in fact that is likely to
happen sometimes. However, overall the application will be more efficient
and effective.
Retail and Online Marketing
Everyone is talking about it these days – you are probably seeing ads online
that seem to read your mind. The technology that lies behind these systems
is based on machine learning. If you are looking for a new pair of shoes,
you are going to see advertisements for shoes all over the place. It’s not
clear that this always works, the algorithms are not able to pick out intent
when someone is looking at something online. If they aren’t already, these
systems are likely to be tied with your behavior in physical or “brick and
mortar” stores, not just with online activity. This is one area where privacy
advocates are not pleased, it remains to be seen where this will go.
However, since its producing powerful results for marketers and generating
a lot of advertising revenue, it’s going to be pretty difficult to end the
practice.
Healthcare
One interesting area where machine learning is starting to have an impact is
in healthcare. While medicine often requires a lot of problem solving skills
and interpretation from medical professionals, a lot of it is rule based, with
treatments that can be easily picked out by machine learning systems. You
can imagine how a system could be trained using databases containing the
characteristics of large numbers of people who have high blood pressure,
and studying the drugs that were chosen by physicians. A machine learning
system can easily replicate this behavior and maybe even do a better job, as
it would be able to look at the characteristics of a person and tease out the
patterns seen in the large databases of patients to look for patterns that
associate one treatment over another with success, or conversely with side
effects. One can envision a system of prescribing for illnesses, when the
prescriptions are essentially routine, as being done in a completely
automated fashion. This could also be done for prescriptions of antibiotics
for sinus infections or sore throats, or even to prescribe oral medications to
diabetics. This would free up doctors for other activities that are a better use
of their talents, and improve the overall efficiency of the healthcare system.
Machine learning can also be used as an assistant for a doctor in more
complex cases. An algorithm can spot something that a doctor might miss,
leading to more accurate diagnoses and treatment.
You may have also hear the phrase “data mining”, this phrase can be taken
to mean the activity taken by data scientists. Data mining seeks to discover
patterns that exist in large data sets. Returning to our example from the last
chapter, we can imagine some characteristic in common that people who
default on loans might have. It could be something unexpected that would
only emerge after the examination of large amounts of data, such as a tattoo
on their left ankle.
In addition to doing “data mining” the field of data science can be paired
with “big data”, another jargon word thrown around to describe this field.
Big data refers to dealing with extremely large data sets that are normally
too cumbersome to deal with. Since the turn of the 21st century, the increase
in the amount of data has been astronomical. More and more, people are
living their lives through the computer, whether it’s browsing the internet or
using their smart phone. This leaves a large trail of financial data,
entertainment data, shopping data, and more. This can require sophisticated
systems that use machine learning or it might just require large amounts of
computer power, storage, and processing, or some combination thereof.
Data science, big data, and data mining have come at a time when the
storage capacity of computers has become virtually unlimited and low cost.
Large facilities maintain banks of computer storage at a capacity never seen
before.
Is Data Science Really a Thing?
There is some argument as to whether “data science” is really a distinct
field of study. Some consider it a buzzword that is being used in place of
statistics, and that the modern “data scientist” is just a statistician using the
tools of machine learning and big data in order to extract information out of
large data sets.
However, the power of data science lies in its wide applicability. It can be
used in medicine, business, or by the military just to name a few examples.
What makes data science unique and powerful, and takes it beyond
traditional fields like statistics or computer science, is that it lies at the
intersection of multiple fields, and so a “polymath” is needed to be an
effective data scientist.
The role of machine learning in data science
Machine learning is a tool that is useful for data scientists. By utilizing
machine learning, the data scientist can increase their productivity by
allowing the machine learning systems to do a lot of pattern recognition
without the need of direct human intervention, which would be impossible
given the large nature of todays data sets in any case. The data scientist can
then focus on what humans do best, in that they interpret the patterns that
machine learning has found in the data.
The Tasks of Data Science
Data science will use statistical modeling and machine learning among its
tools, but what are the goals or outcomes that data science hopes to deliver?
There are many ways that data science can be used. We can’t possibly list
them all, but in this section we will discuss some of the major ways that
data science is applied in practice.
Predictive Analysis
The first major way that data science is used is in forecasting and predictive
analysis. This can be any type of forecasting, data science could be used to
forecast a company’s expected performance in the coming year, for
example.
This brings us to a related application where data science is being used with
increasing regularity. This is for the purpose of product recommendations.
You will see this on Netflix, Amazon, or YouTube. By analyzing your
previous habits and the habits of those who are like you, data science is able
to estimate movies, books, or videos that you are likely to be interested in.
Sometimes this works pretty well but sometimes it doesn’t.
Prescriptive Analytics
Data science and machine learning is finding many uses in a field known as
prescriptive analytics. In this case, the machine learning is used in order to
accomplish some specific tasks. Researchers in this area are aiming high,
looking to replace or augment humans in tasks that they actually do pretty
well. One area where this might be familiar is a self-driving car. Data
collected from autos being driven by human beings can be used to train the
machine learning systems.
This has also been used for sometimes with air travel, with “autopilot”
being used when flying the plane is pretty routine, so the pilots sit back and
supervise, letting the computer direct the plane unless an emergency arises.
Pattern Recognition
If you have the new iPhone X, then you’re familiar with another type of
data science, which is recognition. Data science powers the ability of the
phone to use your face to unlock it. This is also done on the older models
where the thumbprint, rather than the face is used for the same purpose.
This kind of technology could also be used in many other ways, like
picking out the face of a criminal in a crowd.
Anomaly detection is another area where data science is used. Unusual data
patterns can often represent cases of fraud, but once again they don’t always
do so. If a book on Amazon had been up for three months, and was only
getting a few downloads a month and suddenly it got 40 reviews over the
course of one or two days, the system would note that this was anomalous
behavior. I don’t know how Amazon works internally, but it’s possible they
would check into it, and remove the reviews if they turned out to be fake.
Such behavior can be used to alert a human to intervene in the situation.
The brains of animals and humans are wired in part to recognize patterns in
the external world. But when it comes to big data, our brains are too slow
and only able to analyze small bits of information at a time. When
presented with large data sets, they are simply too complex for humans to
take in or recognize the patterns that are hidden in the data. For example, in
order to determine what characteristics people have that might lead them to
purchase a pet cat and then go bankrupt, you would have to examine a large
number of characteristics of tens of millions of people. It’s pretty obvious
that a human being lacks that capability at a fundamental level. You might
be able to recognize a pattern studying 4-5 characteristics of a group of 10
or 20 people.
Computers, on the other hand, are lighting fast and able to find patterns
among large collections of data that humans couldn’t possibly pick out.
Since they are able to check multiple scenarios quickly and iteratively, a
computer system is going to be able to spot things that people simply can’t
detect. Since people can’t or at least can’t readily detect these patterns, we
often say that they are ‘hidden’ in the data.
There are many ways that pattern recognition can be used in machine
learning. For example, criminal behavior often follows distinct patterns.
Auto thefts are going to occur in some specific areas of a city far more often
than they occur in other areas. They may occur more often during particular
times of year. Sure, a person might be able to sit down with a few data
points and determine this with some measure of accuracy, but a computer
would be far more accurate. Using machine learning, it could examine tens
of thousands of cases of auto theft over a dozen years, and in the process it
would probably find patterns that a human being simply would not be able
to detect.
A grocery store could also use data science in novel ways. By issuing
shopper’s cards, the store not only can determine what individuals may
purchase, but now they also know where their customers live. They can
determine how often people from different neighborhoods shop in the store,
and possibly find patterns as to time of day people shop, what days they
shop, and how much they spend each time they visit the store. This would
help the grocery store laser target their advertising. Their machine learning
tools might also detect unusual patterns the store’s owners or managers
haven’t thought of, for example detecting that a lot of the customers come
from a distant location. This could propel them to open a new store close to
the location where these people lived, so that they could increase efficiency
of distribution and make shopping more convenient for their customers.
Examples of Data Science in Use
Data science has been used by industry to make big changes and safe
money in their operations. In one famous example, Southwest Airlines
claims it determined how much time its planes were spending idling on the
tarmac before takeoff, and it was able to use this information to save
hundreds of millions of dollars by reducing fuel waste. Other airlines are
using data science in order to do better planning of their routes, and
improving their logistics.
Data science has also been used by companies like Netflix to help them
recommend movies for you to watch, based on your past habits. You can
see from this example how machine learning and data science are deeply
intertwined. Of course data science is a field for a human, and so machine
learning is one tool that they rely on.
Data Science as a Career
There is no question that data science is a hot career path right now, and it
will continue to be one for the foreseeable future. Most universities don’t
have a specific data science program, that is something that you can create
on your own. In order to do that you could major in either computer science
or mathematics and statistics. But either way, you will want to take course
in the other field as well as in business. Remember in our discussion above
we noted that a statistician would have to learn a lot of computer science as
well as get some business education before they could actually be
considered a data scientist. You don’t necessarily have to get a PhD in order
to make a career out of it, but in competitive science related fields an
advanced education is definitely going to give you an edge. In computer
science, you should focus on AI and machine learning in order to pursue a
career in data science.
If you are interested in a career that uses machine learning, data science is
but one path. You always have the option of going straight through
computer science with a specialty in machine learning. It’s a certainty that
you will be very employable if you choose this path.
Chapter 3: Supervised Machine Learning
In the earlier chapters we have just glossed over the concept of machine
learning. It turns out that there are four major types of machine learning that
need to be considered. In this chapter we are going to find out what they
are, and how they are used in order to achieve the goal of machine learning.
There are four main types of machine learning that are used. These include
supervised learning, unsupervised learning, semi-supervised learning, and
reinforcement learning. Understanding the differences between these is an
important part of understanding machine learning.
Supervised Learning
With supervised learning, the goal is to determine a functional relationship
or mapping between outputs Y and inputs X. You can think of this as a
mapping relationship that would be similar to mathematics. That is Y is
some function of the inputs X:
Y = f(X)
In the real world, in many, if not most cases, the relationship between an
input and an output variable is not a direct functional relationship, but is
instead a probabilistic one. For this reason, a data scientist must have an
advanced understanding of probability and statistics in order to be
effective.
The job of the algorithm during training is to find the patterns behind the
relationship between X and Y. Eventually, the system would be able to
predict the outputs Y for new inputs X for which the results were not
known in advance.
The algorithm uses an iterative method in order to seek out the patterns that
will allow it to correctly predict the outputs. This process of feeding the
model labeled data is analogous to teaching the model. It’s similar to a
teacher in an algebra class teaching the students the concept of a squared
number, and the teacher knows what the answers are.
Data used for supervised learning is grouped in a row and column format,
just like a spreadsheet. Each column is given a label called an attribute. The
various characteristics of a data point or object are also known as features.
A single data point would consist of one row from the data set. So a single
data point can have multiple attributes. Data can be numerical or
categorical, which simply means it’s any kind of data that is not numeric.
Categorical data takes the form of characters.
In this case, a data point would be an entire row, so picking the highlighted
row in the middle, (128, 72, 204, 50, Hispanic) would be one data point.
The data (128, 72, 204, 50) are numeric data, and “Hispanic” is categorical
data in this example. In the case of supervised learning, we would know
beforehand which of these cases developed diabetes and did not, and could
train the algorithm to learn to pick them out by repeatedly feeding it the
data until it attains an accuracy that we desire.
A set of data points that are input for machine learning of the form (A,B,C,
…) is called a vector.
The goal of training is to have the algorithm refine it’s model so well that
it’s able to predict the outcome accurately enough (often termed
“reasonable”) when it’s presented with never before seen data.
Supervised learning follows a path of well defined steps. This begins with
human input, by selecting the types of training examples that are going to
be used in the training process. Once that has been decided, real world data
with known outputs is collected from databases. Vectors are then created
using attributes or features that are desired to play the role of inputs and
they are assembled into rows as illustrated above. Although it might seem
like the more features you have the better, it’s actually preferable to have a
relatively small number of features that are used in the training. This will
make it easier for the system to accurately determine the patterns and
relationships that exist in the data. It will then be more able to accurately
predict future results.
Once the data has been gathered, the algorithm is tested on the training set.
The algorithm will adjust its own parameters as it determines the patterns in
the data. Then the accuracy of the algorithm is evaluated. If necessary, the
process is iterated in order to keep refining the algorithm, using more test
data.
Selecting an Algorithm
When the model is in the design phases, the data scientist will choose an
algorithm that will form the basis of the model. In this case, the algorithm is
a type of model used to determine functional relationships. We will be
investigating this in more detail later, but for now we will note that these
types of algorithms include linear regression, decision trees, nearest
neighbor analysis, or non-linear regression, to name a few. Don’t worry if
you don’t understand what these terms mean, we will discuss that in detail
later. They are general techniques that can be applied to data sets in order to
determine functional relationships.
A model can be trained on any number of data sets, and from the
perspective of a human managing this process, it may not be possible to say
whether one model is any better than another. However, using a particular
data set is going to bias the model.
Variance Errors
Here we again imagine using a process to build multiple machine learning
models, each one trained on its own data set. A variance error is the error
between the prediction of the model and the actual result when one data
point is considered. In the earlier section, we gave an example of a set of
data points that could be used to predict whether a specific individual would
become diabetic given height, weight, age, blood sugar, and ethnicity. So
you’d put in a specific case, and note the error between the models and the
actual prediction.
A model that is flexible is one that is going to have a larger variance error.
Total prediction error is the sum of bias error and variance error.
High variance is also a consequence of data sets that contain a large number
of inputs. The more inputs the higher the variance error. That is, we can see
that it might be relatively easy for a machine learning system to determine
the probability that a given person will get diabetes within five years given
their height, weight, age, and fasting blood sugar or A1C value. But if you
keep adding more information to each data point, such as height, weight,
age, fasting blood sugar, ethnicity, occupation, years of education, prior
military service, born by cesarean section etc., it becomes more difficult for
the model to learn, and hence their error rate will be higher.
Noisy Data
Another error that can arise with machine learning is the case of
measurement or human error when the input data is concerned.
Measurement error is common, and of course we all know that human error
when it comes to data input can be common as well. It’s straightforward to
recognize that bad input data is going to train the algorithm badly, and so
it’s going to have erroneous predictions.
Classification vs. Regression
Generally speaking, supervised learning can be divided into one of two
different types, classification or regression. Classification is a simple
learning process that can be binary or more complex. The prototypical
example of classification is determining whether or not an email is spam, or
whether it’s a genuine message. Although this seems like a simple question,
I am sure that most readers have had one or more experiences with spam
detection systems that misclassifies either valid emails or lets spam through
the filter.
Y = c + b*x
It’s possible to have a hybrid model between the two. This is called semi-
supervised learning. When this training method is used, a subset of the data
will be labeled, that is it will include the desired output values. The training
data will also include a subset which is unlabeled. In most cases, the subset
of data that is labeled is smaller than the unlabeled set.
Different situations will call for different approaches. That means that
sometimes supervised learning will be more appropriate to use, while at
other times unsupervised learning will be the better choice.
There are three general types of machine learning that fit under the category
of unsupervised learning. The three that we will discuss include:
Clustering
Neural networks
Markov algorithms
So when clustering is used, each data point is placed into a defining group.
This is done because it’s believed that members of a certain group are going
to have some characteristics in common. For example, dog owners are
going to have different characteristics in common as compared to cat
owners, and people who own dogs and cats may have unique characteristics
of their own.
This concept has a wide array of applications, one of which is the creation
of a ‘lookalike’ audience in Facebook advertising. The principle behind this
system is nothing more than clustering – Facebook will analyze the features
of people who have purchased a particular type of product in the past, and
then use that to create a new listing of people who have yet to purchase the
product but based on probability estimates, are likely to purchase the
product in the future because their features match up with previous buyers.
When using clustering, the data scientist can determine the numbers of
groups or clusters that are used for the learning. These go by the fancy
name of cluster centroids.
There are several different types of clustering that can be used, but we will
only be able to review two of them here. These are relatively simple to
understand, and they form the basis of other clustering techniques. So once
you understand the basics then you’ll find it easy to pick up different
algorithms.
K-Means Clustering
The first type of clustering we are going to examine is k-means clustering.
The idea behind this approach is to determine the center, or mean of the
clusters. The data scientist can start this process off by choosing the number
of clusters, and it’s been suggested that manually examining the data is
appropriate in this case. So you can eyeball the data to estimate how many
clusters (that is groups or categories) should be used for the data. We could
do that with our database data, and maybe we would use ethnicity and
gender, or ethnicity, gender, age, and BMI.
The iteration process should make the mean or group center more accurate
as time goes on. K-means clustering is a commonly used method, and it’s
simple to understand and easy to use.
Mean Shift Clustering
Mean shift clustering is another averaging based approach. The idea behind
this method is that there is going to be a tendency for data points to be
dense and sparse in different areas. For example, people who become
diabetic are going to be clustered more around ages 45-54 than they are
around 18-24. The latter age group will have some diabetes cases, but they
will be comparatively rare as compared to the 45-54 age group.
In order to find the dense zones of data points in a large data set, a
technique known as windowing is used. Again calling on the diabetes
example, we might imagine that we have a database of a million people to
examine. That is certainly not something that a human observer is going to
be able to look at and tease out the patterns from the data, but you can see
how the human observer would be able to look at a small subset of the data
and then determine the different classifications that should be used when
analyzing it. The algorithm will try to find the centroid of each group or
class.
We can also look for clusters of data that have a lot of data points for the
groups that have been defined by the data scientist. Let’s say that we are
using a database of people who have applied for new home loans in the
southern California region. We could classify them by current address, age,
ethnicity, income level, education level, and so on. The data can be sliced
and diced in any way you see fit, for example you might want to classify
the data by income level, then within each income level you would look at
the other characteristics mentioned such as educational attainment or
current address.
This is where the commonality comes in. If you recall, we mentioned the
possibility or probability that data points that cluster are likely to have
many features in common. As an obvious case, looking at the home loan
data we’d probably find that there is a clustering of people with college
educations or higher among the group that has an annual income over
$150,000 a year, while a group with an annual income of $75,000 a year or
less is more likely not to have a college education.
The process of clustering is an iterative process, and it’s continued until the
mean or centroid stops changing, or changes at a rate that is below some
level that you have pre-determined to be an acceptable level of error.
It may be the case that the first time you use this procedure, it’s not working
perfectly for you. In that situation the procedure used is to try it again,
while using different groups or clusters.
Neural Networks
We touched on the basic idea of neural networks earlier. The fundamental
idea behind a neural network is to set up interconnected nodes. In some
sense, this is a representation that mimics the way the brain is structured.
It’s interesting to think about the fact that the brain isn’t just structured this
way at the cellular level. In fact it’s following this model all the way to the
highest levels. There are many independent regions in the brain that
perform specific tasks, and they are interconnected with each other. For
example, one area of the brain processes visual input. Another area of the
brain processes listening to speech, while another processes actually
speaking. Still other areas of the brain govern the formation of memory and
emotions. These are the nodes of high level neural processing in your brain.
They are all connected to each other by nerve axons which form
communications channels.
The idea behind neural networks was to create a computer system that
could learn in a similar way that the human mind learns. In the case of a
neural network, the specialized areas of the brain or individual nerve cells if
looking at it on that level are replaced by independent functions, which in
computer science terms can be thought of as independent black boxes. They
are connected to each other through parameters that are exchanged.
Once again, the concept of pattern recognition comes to the fore. A neural
network is designed to learn by seeking out patterns in the data that it
receives.
In your brain, nerve cells have different numbers of connections with other
nerve cells. Connections can strengthen, the more you learn something. If
you forget something, the connections will be pared back. A neural network
attempts to mimic this behavior, strengthening connections between
particular nodes in the network.
The fundamental idea behind neural networks is the same as that used for
other types of models. The nodes in the network can be considered to be a
single black box, so you can view the neural network from the outside and
not worry about the internal details. Like other models, the neural network
will have a set of tunable parameters. The way to tune the parameters is to
continually feed data into the black box – that is we train the neural
network.
Then, as it gets trained on more data, the parameters get more finely tuned,
and more likely to give the correct answers when fed new, never before
seen data. Does this sound familiar? The same basic concepts are here, but
with a different implementation.
We use unsupervised learning with neural networks, and so they are not
going to be fed the correct output answers. Instead they will get unlabeled
data. The basic structure of a neural network is shown below. On the left,
we have a representation of our input data which is fed to the network. In
the middle, the hidden layers or black box is shown (we are peeking inside,
so it’s a dashed box instead of black). Then on the right side is the output.
The number of nodes in the hidden layers can be varied. Here we have
shown three nodes per hidden layer. A general rule of thumb is that you
should not try to “overfit” the data. Overfitting occurs when there are twice
as many nodes as there are in the input data. Here, we have three nodes for
the input data, and so the overfitting condition would be six nodes per
hidden layer. Often, the number of nodes chosen is about the average
between input and output data. For simplicity, we’ve only shown one output
node, but obviously there can be more. Using the average, in this examples
the best case scenario would be 2 nodes in each hidden layer, but it’s an
approximate relationship and so three is fine.
When using neural networks, in the same manner that you do with all types
of machine learning you will set a desired error level that is acceptable to
you. When the system gets to that error level, training stops because the
neural network is performing as well as you need it to perform. You can
also have training last for a fixed number of iterations. With each iteration,
the model will improve itself, and then continue until it reaches the number
of iterations that you have specified.
The inputs and outputs of a neural network are similar to what we’ve
already seen. For example, we could set up a neural network to classify
borrowers as high risk, medium risk, or low risk. The inputs to the system
could include age, gender, occupation, income level, credit score and any
other parameters that we felt were relevant.
Each node of the neural network is some function, so its basically one of the
functions that we’ve seen before. In the beginning, the “brain” is considered
stupid, so you might start out with one layer, and a bias is assigned to each
node in the hidden layer. The bias assigned to each node is random.
Each input node has a weight that is assigned to it. So if credit score is more
important than occupation, then the weight of the credit score node could be
assigned some value like 0.7 and the weight of the occupation could be
assigned a value of 0.45, say. These are then fed into the different nodes. In
the beginning, the system starts by using guessing, until the activation
function takes over.
The data is then run through the activation function for each node. The
activation function creates a nonlinear estimate of the output, and then this
compared to the correct answer to get error estimates. This gives you how
far off the guessing that was generated by the assigned bias of each node
was in getting to the correct answer. This information is then fed back into
each node. Then the bias of each node is adjusted. The weights of the inputs
can be adjust as well, so if you find that the model used above
underestimated the weight of the credit score in determining the loan, the
weight could be increase, so say we adjust the weight of credit score to 0.72
and the weight of occupation is adjusted downward to 0.42.
In neural networks, the learning rate is the amount that the biases and
weights of the system are adjusted with each iteration. Momentum is a
measurement used in neural networks to determine how much past results
affect the weights and biases. When the neural network has gone through
the entire dataset, that is an iteration, and these are repeated either a fixed
number of times or until the level of error is acceptable.
We saw that in the case of supervised learning, the answers to each input in
the data set were provided during the training phase. Reinforcement
learning withholds the answers, but uses the concept of reinforcement to
guide the correct behavior. The model has a reinforcement agent that
decides on its own what to do. The agent’s task is to find the best path to get
a reward that acts as reinforcement. So far we have described a cartoon
model, but this is a pretty accurate description as far as the basic concept –
think of giving a reward as a feedback mechanism to improve the output.
Typically reinforcement learning is used when there are multiple solutions
to a problem, so this is an alternative to supervised learning which has one
correct answer for a given input. The programmer is more involved in this
case. When the algorithm produces a given output, then the programmer
can decide to reward the model or not in the correct fashion to guide it to
the answer.
In the old days, horse training methods tended to be harsh. In other words,
they relied on a negative reinforcement type of system, that is also known
as punishment. The trainer might carry whips and ropes with them, and
when the horse did something that the trainer didn’t like, they might use the
whip to strike at the horse on its lower legs or on its rear end. This type of
training relies on creating fear in the horse, which we might consider to be
the agent in our analogy. The desire here is to get the horse to produce the
output we want by behaving in a specific fashion and carrying out certain
work duties, such as being tame for riding or pulling a wagon.
You can also use a positive reinforcement mechanism. When using positive
reinforcement, the trainer would keep some treats with them. When the
horse did something the trainer wanted the horse to do, then the trainer
would give the horse a treat.
A similar method is used with dogs. You can praise the dog verbally, give
him a pet, and then feed him with a treat when he does something that you
want him to do.
There is even a third alternative, which is to use a combination of positive
and negative reinforcement. In the case of animal training this would
involve using force or punishment when the animal did something wrong,
and then responding with a treat with praise when the animal does
something desirable.
Don’t get too carried away with our discussion of horse and dog training,
that was only a loose analogy. It is to get you in the frame of mind of being
able to see how different reinforcement mechanisms can shape the behavior
of a system. In that case the system would be the horse. People aren’t used
to thinking of children or animals as “systems”, but hardcore computer
scientists might be doing so.
Humans also learn in part through this interaction model. One common way
that reinforcement learning is explained is to imagine a child approaching a
fire. Suppose that it’s winter and it’s cold outside because there has been a
major snowstorm. A child may approach the fireplace in order to get
warmth. As the child approaches, the child feels better. They can get very
close to the fire, and they still feel good, warming their body up after
having been outside in the snow. The child may have an urge to put their
hands out facing the fire in order to warm up their extremities. This will feel
good to the child, and so it acts as a positive reinforcement mechanism.
Then if the child reaches out further to actually touch the fire, they will feel
the burn and extreme heat, and pull their hand away and start crying. This is
negative reinforcement, which teaches the child not to actually touch the
fire. We see that in the same situation the child has learned two things
through positive and negative reinforcement. The first is that fire provides
warmth, as long as you stay a little bit distant. The second lesson is don’t
touch the fire.
In summary, the agent learns the correct behavior by performing the correct
actions and seeing the results. This type of machine learning is particularly
effective for making artificially intelligent versions of games. So for
example, we can make a chess game, or a computer version of Go that uses
artificial intelligence. In both cases, the agent and reward model would be
used. Consider a chess game, the agent is the computerized chess player.
They learn by playing games, and making the right moves that bring
rewards. From this experience the agent learns what the right moves are and
gains experience. In the beginning the agent is dumb, it may know the basic
rules of the game but its not going to know what the best moves are in
different situations that will lead to victory when playing chess games
against talented opponents.
The State Model of Reinforcement Rearning
Reinforcement learning begins with the system in a certain state, or
configuration. If you are unfamiliar with this concept, think of a lamp. The
lamp has two states – on and off. The lamp begins in the state OFF, and
there is one action an agent could take with respect to the lamp, which is to
flip the switch to turn the lamp to the ON state.
You can think of more and more complicated examples. A calculator (more
likely an app on your phone than a physical calculator these days) begins
with a state zero. That is there is a number zero displayed on the screen. As
the agent, you can take actions on the calculator to change its state.
The agent can take an action based on the state provided by the
environment. In a chess game, the human player may open with a specific
chess move. The agent will analyze the situation, and then it will decide to
make a certain move in response. A reward may result from the action.
Reinforcement learning is a loop based system. After the agent has made
it’s first move, the new situation forms the basis for the input to the next
state. You can view a chess game as iterating through a series of different
states. As each player makes their move, the state of the game changes.
When gamma is small, the discount is large. When gamma is large, the
discount is small. Thus gamma and the discount have an inverse
relationship. A large gamma is associated with an agent that seeks out the
long-term reward. An alternative way to state this is that a small discount is
associated with an agent that seeks out the long-term reward.
But if gamma is small, the agent prefers to get the short term reward. At
least the agent knows that they will get something. As we said above, this is
also the case when the discount is large.
Have you played the Pac-man game? In this game, you control the main
character, and you can try to get through to the next level or focus at first on
eating up as many coins as you can to earn points. The game is setup so that
you’re going to have an inclination to do this. Might as well build up some
points before running into an enemy character and getting killed. Pac-man
can be thought of as having a large discount for the games main player.
When are Rewards Given
Let’s think in terms of a video game to understand this concept. If the
rewards are given at each stage of the game, it’s called temporal difference
learning. So you can get a reward during game play. The alternative is
Monte Carlo, where rewards are cumulative and handed out at the end.
An episode in a game can be the first iteration of play. If you are good at the
game, you can get through the entire first level without being killed. The
end of the level would be the end of the first episode. However, you don’t
have to make it to the end to end the episode, getting killed also ends it. You
can then replay the episode if you haven’t run out of lives.
Continuous Tasks
A game can take a different approach, and rather than having levels it can
be continuous. An endless runner or flappy bird type game could be
continuous. At every instant, the agent is interacting with the environment
and can be rewarded or punished for exhibiting the correct behavior.
Q Tables
A q-table is a collection of states and actions in pairs, arranged in a table.
The Q-table is updated when each episode is completed. Depending on
performance, the q-value, which is initially set to zero, is updated. The
agent can use the q-value in order to determine its next action, based on the
reference in the q-table.
Chapter 6: Algorithms for Supervised Learning
y = f(x)
Since linear regression has become closely associated with data science,
independent and dependent variable labels are not used as frequently
anymore. You can also refer to x as the predictor, and y as the response
(output is also acceptable). When there is only one input or predictor, the
model is said to be simple, and so if you hear the phrase simple linear
regression, this means that it’s linear regression with one input variable. A
model that had six input variables would not be simple. Instead, that would
be called multiple linear regression.
The line that is drawn through the data is actually calculated using linear
regression. That is why the term ‘linear’ is used, it’s a linear representation
of the relationship between the input and output variables. This type of
model is not going to give an accurate prediction of what the output would
be every single time. What you can say is that it uses a line which fits the
empirical or measured data that will give you the trend between the input
and output variables. Suppose for example, that the above chart gave the
heart attack rate for men who were aged 50 at different body mass index.
The horizontal axis would be the body mass index and the vertical line on
the chart would be the heart attack rate. What we have here is a line that fits
the data, so we can give you a statistical answer of what the heart attack rate
is for a given BMI. However, any one individual is not necessarily going to
fit the statistical data.
y = mx + b
When used in machine learning, the job of the system is to learn on a large
data set to determine the values of m and b as accurately as possible. The
more data and more iterations that are used, the closer m and b will become
to the idealized values.
e =w–y
In practice, what’s done is you calculate e for a large range of data points,
and then you square the differences, and sum them up. So now we have a
measure of the total error. In order to make the model as accurate as
possible, we want to minimize this quantity. This is called the ‘least squares
method’ because we are trying to get the least or smallest value of the sum
of the squared errors.
This is a simple calculation, and the values calculated are used to find m
and b in the formula that gives you the straight line. So this is a very simple
type of machine learning, but don’t be fooled by its simplicity. It has
enormous value in research and has great predictive power.
Researchers are going to note certain things about the relationship between
the input and output variables. For example, what is the trend? If the line is
sloped upwards, that means that the output increases when the input
increases. On the other hand, if the line is sloped downward, the output
decreases when the input increases. Simply knowing this relationship can
be very important for many applications. For example, there is a
relationship between the HDL cholesterol level and your heart attack risk
that follows that type of pattern. HDL cholesterol would be the input
variable, and heart attack rate is the output variable. If you were to get a
large set of data from a health study and build a linear regression model
from it, you would find that when HDL cholesterol level increases, the
heart attack rate decreases. So any linear fit that you got between the two
variables would have a downward slope.
Not all graphs are as scattered as the one shown in the picture, in fact in
most cases where linear regression is used you’re likely to see the data
points clustered closely about the line used to fit the data. If the model
didn’t have strong predictive power it wouldn’t see used so much. Once the
machine has learned on a set of training data, it will be able to predict
output values for new inputs that it hasn’t seen before.
There is much more you can learn about linear regression, there are entire
courses taught about this one topic alone. You can do yourself a favor by
becoming extremely well acquainted with it. One nice thing about this
model is that it’s simple enough that you can really understand the model.
This can be done by using it by hand with small data sets to see how it
works. You can do the calculations by hand and then get some larger data
sets and setup a linear regression model in a spreadsheet, or write a small
computer program to implement it.
Although we didn’t discuss it here, you can use linear regression with
multiple input variables, but it’s more complicated. That is something that
you are probably going to want to take straight to the computer.
Logistic Regression
Logistic regression is another method used to determine a statistical
relationship between input and output variables. In this case the first
situation we are going to look at is when the output variable is binary, that
is it can only have one of two states. This can be yes or no, male or female,
0 or 1, etc. When this is the case, data scientists say that the variable is
‘dichotomous’. There can be one or multiple input variables for a logistic
regression model.
Logistic regression is used any time there is a yes or no answer to the
question you seek. It could be used to detect spam email, since whether or
not you put an email in the spam folder is a yes or no question. Linear
regression can be used to solve such problems as well, but with linear
regression you would have to pick a threshold to determine whether some
input variable produced a “yes” or a “no”.
There are certain mathematical functions that can be used to take input data
and generate a yes or a no answer. These functions will smoothly but
rapidly approach one value or the other. The power of these functions is
they can help the modeler set a threshold to determine whether or not
something is a “yes” or a “no”.
Consider the case of a spam email. A spam email might have certain
characteristics, like a claim that you are about to receive money. Or it may
ask you for personal information, or appear to come from a certain source
when an analysis of the computer system it came from shows that this is
deception. It may use certain language, like claiming you have received a
free gift. Each characteristic that has been identified as being associated
with a spam email can be given a score or weight. Then when the email has
been thoroughly examined, the weights can be put together in order to
develop a composite or total spam score. There will be a threshold set by
the system to classify whether or not something is spam. If the email passed
the threshold, then it will be identified as spam. If not, then the email is
going to be routed to your main inbox. The threshold used to make the
determination is known as the decision boundary.
Of course the system is going to make mistakes. Many emails that fall
below the decision boundary will actually be emails that the user wants to
receive. However, since machine learning systems are always able to learn,
the user can move a given email from the spam folder, and this will teach
the system that next time an email comes from that sender, it’s not spam.
Simple logistic regression begins with a model that is similar to that used
for linear regression. The output will be 0 or 1, and it will use a linear
relationship of the form y = mx + b. So at the core of this algorithm, we
find that there is a linear regression model that is used to fit the data.
Of course not every question is binary. But what is the core characteristic of
a binary output? My answer to this question is that it’s discrete. Something
is either 1 or 0, there is no in between. Of course in reality this questioned
can be nuanced depending on the application. In the case of email spam, we
are only offering the probability that the email is spam, and then in the final
output it’s binary.
There can be discrete outcomes with more than two choices. You can have
three, four, or any number that you like. The beauty of the logistic
regression model is that you can expand it and extend it as far as you need
to in order to build your model. This is called multinomial logistic
regression. In this case there is no ordering but there are discrete
possibilities.
For example, we could use it for a study of voting results. There are discrete
possibilities (Democrat, Republican, Green, Libertarian, Did Not Vote …).
You can also envision building a model that tried to predict what type of
diet people followed based on various input variables. For example, you
could have (vegan, keto, carnivore, Mediterranean). Your training data
would be a set of people that actually are following one of the diets, and the
task for machine learning would be to seek out patterns in the data that
could be used to predict what diet a person is likely to choose given the
input characteristics. This makes sense to use for a logistic regression
model because the relationship is going to be statistical and not absolute.
Therefore there would be thresholds for each possibility.
It’s also possible to use ordered data sets. This case is known as ordinal
logistic regression.
Decision Trees
A decision tree can be used for categorical or numerical data. You can think
of moving through the decision tree by asking a question at each branch or
node. These are called decision nodes, and they are made up of two or more
branches. A leaf node is called a classification or decision. The best
predictor is the root of the tree, or the root node. At each node, some
question is answered in yes or no fashion, and the algorithm proceeds
through the tree until it arrives and an answer. These trees are actually
usually presented “upside down” with the root node at the top of the tree.
A machine learning model can learn a decision tree by splitting the data into
different subsets based on different attributes. This is done using a process
called recursion, where a function calls itself based on revised inputs, that
come from the outputs of the function. Target variables are set based on
values of attributes. The recursion continues until it has reached a point
when everything at a given level has the same value of the target attribute.
One area where decision trees can be used effectively is with simple
medical diagnoses. For example, we could determine whether or not a
patient has a bacterial sinus infection or not. The decision tree would ask
multiple questions that were arrived at by training the machine learning
algorithm. The training would enable the algorithm to find patterns in the
data that could be used to determine whether or not a specific patient was
likely to have a viral, or bacterial infection.
It could start by asking whether or not the patient had a fever over 100
degrees. If the answer was no, then the patient probably has a cold virus. If
it is yes, it’s possible the patient has a bacterial infection, but they may have
a more serious viral infection that only involves telling the patient to go
home and rest. Therefore more questions are necessary on the tree. For
example, the next question might be how long the fever has lasted. Using
the same procedure, the machine would continue asking questions that
would be related to rules doctor use to estimate whether or not an infection
is bacterial or viral in nature.
Since the primary use of decision trees is simple classification, they are not
computationally intensive. As you can see from the examples given, a
simple check of a question and answer using the attributes of a data point is
all that is necessary.
However, decision trees are best used with simple classification problems.
If there is a large number of classifications, errors can result. Decision trees
are known to overfit training data.
Random Forest
A random forest is a variation on a decision tree that attempts to overcome
its weaknesses. In many applications, a decision tree will be more than
adequate for the tasks that you are facing. However, if the task appears to be
complex enough to bring out the weaknesses of the decision tree, a random
forest will be more appropriate. The random forest will create multiple
decision trees during training. One advantage of a random forest is that it
won’t have the overfitting problem that decision trees can have.
Nearest Neighbor
The k-Nearest Neighbor model can be used in either regression or
classification problems. Although it’s used in both, it’s normally used more
frequently in classification problems. To apply the algorithm, different
classifications are assigned to the objects of study. A “vote” is tallied
among the neighbors that are nearby.
Then the data is sorted in this fashion. That is we arrange the distances from
smallest to largest. If the distance is small, then two data points are alike,
and if the distance is large the two data points are not alike. This is where k
comes in, it is going to determine our criteria for determining whether or
not two data points are similar or not.
You use k to pick out the top k rows of the data. Since our data is organized
by distance, it picks out the k-nearest neighbors. Remember the top row of
the data is the smallest distance between data points.
Now we have our “vote”. This is done by determining the most frequent
classification that occurs in the top k rows. That is the predicted class.
Chapter 7: Tips for Your Work in Machine
Learning
Choosing the wrong algorithms could not only lead you in the wrong
direction and waste time, but companies that are hiring data scientists aren’t
going to be interested in people that make that kind of mistakes. While
people are often excited about trying to get going with the advanced
material, learning the algorithms in detail and coding them up, if you don’t
have a solid foundation in place from the very beginning then you are
unlikely to have any kind of success. People are not going to be interested
in working with someone that doesn’t have a firm grasp on the
fundamentals.
A good exercise is to review famous cases online, and see what algorithms
were chosen for them. This way you can learn from the experts who have
already been working in the field. You will want to compare different types
of problems. For example, image detection, facial recognition, logistics,
approving someone for a loan, or showing someone an online
advertisement. Each of these problems is quite different in nature and
scope. Even more to the point, they are going to require different
approaches when data science is applied.
Try looking at many different problems in this way so that you can
determine how good you are at determining the correct way to view a
problem. This is the first step in being successful.
Knowing Which Type of Learning to Use
The learning portion of the process of developing a model is one of the
most important. You need to understand why you would choose supervised
over unsupervised learning, or vice versa. Remember that it’s not whether
or not one is ultimately better than the other or not, the question is whether
or not one is best suited for the problem at hand or not. One case is obvious,
and that is if you have the answers to the questions, that is you have the
output data for the inputs in your data set. Remember that supervised
learning is a case where the system is given the outputs. But don’t stop
there. Depending on the nature of the problem at hand, you may or may not
actually want to go ahead and give the answers to the system. In some
circumstances, even though you have them, it might be better to use
unsupervised learning instead.
Also remember that you can use a combination method. That is you can
supply the model with the answers for a subset of the training data, and then
use unsupervised learning for the rest of the data. This is one area that you
should consider for further study after finishing this book. Learning which
is the best approach is an important skill, and if you lack judgment in this
area you are unlikely to go very far with machine learning.
One way to test yourself is to follow the suggestions given in the previous
section. That is, look at what people did in previous examples. You can start
by looking at simple, artificial cases, but you should become familiar with
how this has been done in the real world. Rather than just looking up the
answers, challenge yourself. Begin by simply looking at the problem is
framed. Then you work out for yourself what the right way to proceed
should be when it comes to the learning. After you have done this, and only
after, you can then go ahead and find out what people decided to do in the
real world.
Data Selection
Many people get overly excited about the algorithms and the process of
machine learning itself. Equally important however, is the data that you
select to use in your training. If you choose the wrong data that can mean
that your model delivers bad or irrelevant results. Your career as a data
scientist will be short lived indeed, if that is the case.
You’ll also want to focus on using data sets that are the right size for the
right problem.
One way to get a handle on this is the same approach that you would use in
any other situation. Look and see how other people have handled this
problem, in particular for real world cases. Read any literature you can on
case studies to help educate yourself on these issues. Being able to choose
the right test data is as important as deciding whether or not you should use
supervised or unsupervised learning, and what algorithm you should use in
a particular case.
It’s important to learn from the experiences of others, seeing what types of
data are best suited to different algorithms. Sometimes, you might let the
data choose the algorithm for you. At others it’s going to be the other way
around.
Sometimes you’re just going to have to let the results speak for themselves.
When you run your tests with a given data set, if you are not getting the
results you are expecting, you might have to re-examine your choice of the
data. It could be that you are using the wrong algorithm but you might also
be in a situation where the data set just isn’t right. Maybe the data set isn’t
large enough, or it might not have the right attributes. You will have to
evaluate each case independently.
At other times, you might actually be using too much data. This can be in
an absolute sense, for example perhaps you are feeding it too many rows.
On the other hand the problem might lie with the attributes. It’s more than
possible to try and feed an algorithm too many attributes. There is a sweet
spot in every single case. An algorithm has a need for a proper number of
attributes in order to find patterns in the data. If it can’t find the patterns, it
might be because there aren’t enough attributes.
On the other hand, if we feed it too many features, then the algorithm might
not do so well in its pursuit of finding meaningful patterns that are hidden
in the data. Maybe the birth place of each person is also included in the data
set. Is that really relevant to the question at hand? It is probably only of
tangential interest unless you’re specifically studying whether or not people
born in certain locations are more likely to develop diabetes. But for
determining whether or not one gender is more likely to get diabetes than
the other by a specific age, that data point probably has no relevance
whatsoever. Including it is going to reduce the efficiency of your model.
A good way to attack this problem from the front end is to do an analysis of
every attribute and feature in the data. Simply take a look at each attribute,
and ask yourself whether or not it’s important for the question that you are
asking in this particular test. If the answer is no, then jettison it. You can
always add it back later, but you will save yourself a great deal of
headaches by taking a proactive approach to your setup.
Tools to Use
It turns out that you don’t have to be the world’s best programming expert
in order to succeed with data science, although I advise getting a formal
education. The good news, however, is there are some relatively simple
tools that you can use in order to become well acquainted with this field. In
order to do data science you don’t have to become an expert on building
large scale software projects using object oriented programming. In fact
when you get down to brass tacks, while the contributions of computer
science to the field are important – and I urge you to get a solid background
in it – understanding the algorithms in a way that might come better from
statistics and probability is going to be more important.
One of the most popular tools used for machine learning is Python. The first
thing to note about Python is that it’s a lite tool. Second, you can use it on
any system. Python is fast and great to use for calculation, and it’s also
something that can be used to “model” more complicated software that
might be written using a more sophisticated programming language later.
There are many books available on Python, find the one that is the most
suitable for your tastes and experience. I recommend taking it step by step.
Nobody should try rushing through the process of learning any technical
field, and that includes data science and machine learning. Going step by
step means that you should find a solid book and work through the
examples, but you should choose books that are going to challenge you to
work on your own projects that gradually increase in complexity.
There are also many well regarded free resources that can be used to teach
yourself how to do coding with Python. You can find many videos on
YouTube, or consider taking a low priced course on Udemy. You might
check out the Kahn Academy which remains free and offers a huge library
of videos that will help you become a master of nearly any subject.
The reality is that you should not select one tool or another. You should
learn both Python and R. In the competitive world of data science, you are
going to be better off if you have become a master of both tools. After you
have learned them, then you can switch between one or the other as
required by your work. You can have your preference, and that is fine. But
remember that people that you work for may have their own preferences
that might be different.
You should also not assume that you are going to be able to work using
Python and R all the time. This is why I strongly recommend that people
get a diverse background in multiple programming languages. Different
situations may call upon the use of different programming languages, and
you may be required to know more sophisticated programming languages
as a part of your job requirements. The best way to deal with this is to
prepare ahead of time, but to have a realistic picture. When I was in school,
there was a much smaller set of programming languages. Today they have
proliferated, but the utility of so many different languages is very
questionable indeed. What you should do is make sure you are comfortable
in the top 3-5 programming languages. You don’t have to be an absolute
expert in each one, but if someone were to hire you for a job using one of
the languages you should be able to tell them you know the language, and
you know it well enough that it’s not going to slow you down on your job.
Again, one of the best ways to take care of this is to get a degree or at least
a minor in computer science. Doing so, you’ll probably be exposed to at
least three different programming languages, but that will depend on the
university that you attend.
Another reason that I recommend computer science is that it’s not really
about the programming languages. Success in science or engineering
actually comes from your ability to solve problems, and this is what
computer science actually teaches. Computer science isn’t about ‘coding’ or
specific computer languages, it’s about teaching you how to think as a
computer scientist– that is training your brain. For many people it’s a very
challenging curriculum. One thing that I will guarantee is that someone who
goes through it is going to be able to outcompete someone who simply
learns ‘coding’ on the internet. So learning coding using resources on the
internet is a good way to get started, but you should not cut any corners on
the way to a career in data science if that is your objective.
Practice
In the beginning, start simple. Like I suggested, starting with a good book
that has a lot of examples that you can work through is important. You
should practice coding your own models using Python or R, and then you
should seek out sample data which is available online. There is nothing like
practice, remember they say practice makes perfect. There was a book that
came out a few years ago called Outliers, one of the assertions of the book
was that great people become great at what they do by devoting 10,000
hours of time to their chosen pursuit. If you play golf for 10,000 hours,
you’re more likely to be able to compete with Tiger Woods than if you just
went on your own natural talents. Unfortunately, many people see the value
of playing golf for hours on end but they don’t apply the same logic to
technical and professional topics. You might consider bucking the trend,
and devote enough time and energy to data science and machine learning as
you would if you were a member of a professional sports team. You can
practice and study enough to become an expert in the field. When you do
that, you are going to be someone who is highly sought after, and you are
going to be better at your job and be able to deliver the kinds of results that
people are expecting. One of the challenging things about data science is
that while it’s in high demand, and there are lots of jobs all around, if you
don’t deliver you aren’t going to survive.
Utilize Mixed Learning Models
An important tip that has served me well is to try mixed learning models
when I don’t seem to be getting the results I expect from one type or
another. If you are not seeing results from supervised learning, then
consider using a subset of your data for unsupervised learning, and vice
versa. There are many options to consider, remember that there are many
hybrid variations of partial-supervised learning. The best ones to use will
depend on the circumstances.
This is a great way to breathe new life into a project that seems to be going
nowhere. If something is stuck, it can help you revitalize things and get it
launched off the ground. Don’t be afraid to try multiple approaches. One of
the worst things that can happen to any engineer or scientist is to get caught
up with just one or two beloved approaches that they are afraid to give up.
Or, maybe they are afraid to try something new.
Have Realistic Expectations
It’s important to have realistic expectations. Not every case of machine
learning is going to be the next big breakthrough. In fact, in some
applications machine learning might produce disappointing results. If that is
the case for one of your projects, don’t be afraid to cut it loose. This may
mean changing algorithms, refining the test data, or even abandoning the
project all together. Sadly, there has been a great deal of hype surrounding
machine learning. And as excited as I am about it, it’s hard to say that
maybe sometimes the hype is a little much. That doesn’t mean I’m
abandoning the field. All it means is that you need to keep things in some
perspective.
Chapter 8: The Future of Machine Learning
Machine learning has been one of the most exciting developments to come
out of computer science and artificial intelligence in a very long time. Of
course machine learning began its long road to its present form many
decades ago. But it’s only recently that we’ve seen machine learning getting
widespread application to the extent that it’s actually changing the way that
society is operating. Understanding these changes is going to be one of the
most important things going forward for the data science community and
society at large. I try to stay optimistic, there are some reasons for concern
but I have to keep believing that Machine learning is going to provide many
tools which are going to help improve peoples lives. Every tool
development machine learning doesn’t have to be the most dramatic and
breakthrough development of that man has ever seen. We can probably go
through history and I would have to say that people like Nicholas Tesla or
Thomas Edison we’re probably a lot more impactful on society then a lot of
the new technologies that we are seeing today. I would say that’s the truth
for any given individual technology. However when you take a look at the
sum total of the changes that we are seen from machine learning, society is
going to be transformed a great deal in the coming decades.
One of the things that we are definitely going to see is there going to be a
great deal of jobs that are eliminated. We have already seen the
development of robots that can stack and manager warehouse just as well as
any human worker. They haven’t yet been deployed in a real situation, but
the fact is it’s only a matter of time before that happens. Second, we’ve
already seen the developments of robots that can do many menial labor
jobs. The most famous of these is a robot which is able to cook hamburgers
in a fast food restaurant. This may be unfortunate for those pushing for a
$15 an hour wage for that type of work because right now the robot is too
expensive to be practical. But if you keep pushing it at some point the robot
becomes a cost-effective investment. Regardless of what happens, regarding
the wages, the downward pressure on costs that usually happens with
technology almost ensures that robots that do menial labor are going to be
taking a lot of jobs over the next 5 to 10 years.
Remember that this is nothing new however. I hate to bring it up yet again,
but people in the 18th Century had great fears of losing their jobs to the new
machines that were then making their way throughout society. The concerns
of those people turned out totally misguided. In fact they were dead wrong.
Of course that doesn’t mean that we should mock anyone who has concerns
about these lost jobs now, nor does it mean that we should dismiss them.
We can’t assume that because jobs were created in much larger numbers in
the past due to technological changes that this is always going to happen.
However if I have to been on it, I would definitely suggest that that’s
probably the case. One thing that people are really good at is finding new
things to do. Look around you and observe all the things that we do now
that were even existing as a mere thought 50 or hundred years ago. As an
example consider the video game industry. Today it generates billions of
dollars and it employs tens of thousands of people. Every time that human
labor is liberated new uses for it are quickly found.
It’s hard to say where the future machine learning lies, but one thing it’s
going to do is allow people to have a more personalized existence. We’ve
already seen great strides towards this over the past decade or so. Now
everything is personally curated from music to videos. This process is in
their early stages of development. It’s only going to accelerate in the
coming decades.
Another thing were likely see is the application of machine learning to more
and more areas throughout life. The growth of data science and machine
learning has been explosive in the past 10 to 15 years. That trend is likely to
continue.
One very important factor which I will call the Ace in the hole is quantum
computing. It’s unclear at this point whether quantum computing is
something that will become practical or remain in the realm of theoretical
investigation. If practical and functional quantum computers are ever built it
will be a game changer on the scale of the Industrial Revolution. How
quantum computing is merged with machine learning could be one of the
most interesting intellectual challenges in the coming century. There is no
question that if quantum computing becomes a practicality life is going to
be very different afterwards. The changes that we’re experiencing right now
are going to seem incredibly trivial.
Conclusion
Machine learning is becoming more and more important as time goes by.
This is an exciting time to enter computer science and data science and
learn how to use these tools in order to complete amazing tasks. Although
machine learning seems brand new, the concept of machines and learning
goes back to the dawn of time. Human beings have always been tool
makers and tool users, and todays machine learning systems and artificial
intelligence are merely the latest ‘iteration’ in the long history of humans
extending their minds through the use of tools.
The same holds true today, as it always has. Each new generation brings up
the old Luddite ideas, only to see them die yet again. This happened in the
early 1980s when the invention of spreadsheets and accounting programs
on personal computers led people to a state of fear where they imagined
accountants and office workers would no longer be necessary. Then they
did the same thing with word processors, imagining that secretarial work
would be eliminated. Neither happened.
Now here we are again. After languishing for decades, artificial intelligence
is finally coming to life. Robotics are enhancing the capability of
manufacturing companies to produce more output with less cost and labor.
Despite the fact that as I am writing this the economy is at full employment,
politicians and people in the media are once again making the Luddite
charge. This time artificial intelligence and machine learning are coming for
your jobs. The short-sighted people who promote these fears miss the point
entirely, they fail to see that the changes the world is currently going
through are going to do what they always have and that is raise productivity
and free people to do other things.
Hopefully they won’t stop this exciting engine. I suspect they won’t be able
to. Governments and corporations are already in love with machine
learning. The reason is simple. Machine learning works. After decades of
oversold promises, computers are now in a position where they can learn,
and once trained, they work autonomously. Machine learning makes
corporations run more efficiently and helps government identify fraud,
prevent cyberattacks and do a million other things. Since intelligence is
general, the applications of machine learning and AI are general.
AI holds out a special fear for people who hold the Luddite perspective. It’s
one thing to say that a machine is going to destroy your job, but people also
have unrealistic fears of machine learning systems. They seem to think that
AI and robots are going to take over the world as if they are living in some
kind of science fiction fantasy. There is no doubt that as the years go by
machine learning is going to continue to improve and be able to take on
more tasks, but what those tasks are used for is entirely in human control.
These tools can be used for nefarious purposes. In the Republic of China,
machine learning tools like facial recognition are being used to control and
harass people. Tools that in the United States and Europe are used to
determine someone’s credit worthiness for a loan are being used in China to
give people a ‘social score’. According to news reports, the social score can
be used to keep people from traveling or being able to leave the country.
At this point I’d like to personally thank every reader for taking the time to
read the book and making it all the way to the end. I hope that the book has
been educational, and that you’ve learned a great deal about machine
learning and how it works. If you are new to machine learning, my hope is
that you’ve come away with some of the mystery behind it stripped away.
It’s not as complicated as it might seem when you are first exposed to the
ideas, or just hearing about them in the media.
If you read this book just to become more informed on the topic, I hope that
this book has satisfied your curiosity. If you are really interested in machine
learning and would like to become educated in it and possibly pursue it as a
career, the best place to start is to learn how to code. In order to get a job in
this competitive field you are probably going to need some kind of
academic degree and this is one of the few areas in college where you can
go to school and directly learn the skills you are going to need on the job.
But you can get started by learning python, which is a simple programming
language that runs on any system. You can get online and search for
machine learning algorithms built in python that can use small data sets to
start giving you practice in this field.
I would also recommend taking a few business courses. You will want to
learn about things like logistics where data science is often applied to
generate solutions for large corporations. It works well and I can guaranteed
that one thing you will be able to count on is you will be able to get a job
doing this if you get the right education.
How far you go is up to you, the more the education the higher level at
which you are going to start and these types of fields can be very
competitive. A masters degree would be great and for those who want it a
PhD will put them in the best position. If you have followed my advice with
a double major, you can pick one or the other for a masters degree, it’s not
necessary to continue trying to get more academic credentials in both.
Thanks again for reading, and please visit Amazon and leave a review for
this book!
Python for Beginners
A Step by Step Crash Course to Learn
Smarter the Fundamental Elements of
Python Programming, Machine
Learning, Data Science and Tools,
Tips and Tricks of This Coding
Language
Table of Contents
Introduction
Chapter 1: Introductory Chapter
What Is Python?
Which Are the Advantages of Using Python?
Differences Between Python and Other Programming Languages
Introduction to Data Science
Machine Learning
What Is Machine Learning?
How Is ML Classified?
What Are Some of the Current Applications of ML?
Artificial Intelligence (AI)
Fundamentals of Artificial Intelligence
Why Is Artificial Intelligence Used?
Artificial Intelligence and Programming Languages
Python and Artificial Intelligence
What Are the Features and Advantages of Python with AI?
Characteristics of Artificial Intelligence
In Which Areas Is AI Applied?
Why Python? Why It Is Highly Recommended in the Above Topics?
How to Start Installing Python and Its Interpreter?
Using Python with Shell and IDLE
Chapter 2: Basic Concepts
How Can We Declare a Variable?
Data Types
Operators
Interactivity and Its Importance
Basic Functions for Interactivity
Standard Output with print()
Standard Input with input()
Chapter 3: Conditionals
If Statement
Else Statement
Elif Statement
Chapter 4: Loops
While Loop
Statements Used in the While Loop
For Loops
Chapter 5: Functions
Some Python Functions
How Can I Create My Own Function?
Parameters
Return Statement
Lambda Function
Filter() Function
Map() Function
What Is the Difference Between Def and Lambda?
Variables
Global Variables
Local Variables
Chapter 6: Modules
Why Should You Use Modules?
How Do We Create a Module on Python?
Import Statement
How to Import a Module?
Chapter 7: OOP (Object-Oriented Programming)
What Is a Class?
Chapter 8: File management
How to Access a File?
Open() Function
Read([ ]) Method
Readlines() Method
Close() Function
What Is a Buffer?
Errors
Encoding
Newline
Handling files with the "OS" module
xlsx Files:
Handling PDF files
Handling of BIN Type Files
Chapter 9: Exceptions
What Is the Possible Solution to This Problem?
Chapter 10: Other Topics
Python and Serial Communication in Electronic Devices
Python and the Databases
Python and Graphical Interfaces
Conclusion
Introduction
This book is made for all those interested in learning from scratch how to
start coding and programming on Python. If you already have some
knowledge of this language or another one, you will find very interesting
and important information here that we are sure you didn’t know.
In the following chapters of this book you will be able to get started into the
world of programming, starting with an introductory chapter, where you are
going to learn about the Python differences, advantages, uses, some
information about Machine learning and Data science, and even how to
install it according to your operating system. After the introductory chapter,
are all of the fundamental elements of Python like data types, operators,
control statements, loops, functions, modules, OOP (Object-oriented
programming), file management, and some other extra information.
There are plenty of books on this subject on the market, thanks again for
choosing this one! Every effort was made to ensure it is full of as much
useful information as possible, please enjoy it!
Chapter 1: Introductory Chapter
What Is Python?
Python is a very versatile interpreted language developed by Guido Van
Rossum. It is a high-level programming language with a user-friendly
approach, it is an object-oriented programming language, with dynamic
typing and easy to interpret syntax, all these features make this language
ideal for scripting, as well as we can use it to make applications in a variety
of areas.
And, as, assert, break, class, continue, def, del, elif, else, except, finally, for,
from, global, if, import, in is, lambda, nonlocal, not, or, pass, raise, return,
try, while, with, yield, among others, are some of the "keywords" that you
will use in this programming language.
Python has the advantage of being a free code and easy access, with a quite
large library, in addition to a very large community of programmers,
making it much easier to use, it also offers a better error checking than
programming in C. This a language of very high level, among its many
other features Python will allow you to separate the program into modules
that later you will be able to reuse, in addition, it has a collection of
standard modules that you can use as a base for your programs.
Python is also an interpreted language; this term tells you that you will not
have the need to compile or link particularly. You will be able to use this
interpreter interactively. This programming code will allow you to write
compact and readable programs, besides Python programs are considerably
shorter than those made in C++, for example, and this is particularly due to
the syntax used.
Which Are the Advantages of Using Python?
As the main advantage, we can mention that this language has a method
that simplifies programming a lot more, and this method is based on
proposing a pattern to follow, which is why it is considered a great language
for scripting. It is a script code; this means that you don't have to declare
constants and variables before using them. Also, as it is an interpreted
language, when you run the program, all the lines of the program are not
read, only those that are being executed at that moment, so it can be said
that it goes line by line, which makes it more manageable. It has a high
speed in development; it can be used on multiple platforms; it is an open-
source language; therefore, communities and users can create their own
versions or create add-ons that may or may not be added to the following
versions of Python.
In the case of Python and JavaScript, both are interpreted languages, so you
will see in practice a file, and you will write your code in this file, you save
it so that the computer can execute it, but for this to happen you will need
an extra program that interprets these codes and allows the computer to read
it, and this is what an interpreter does, if we install it in Linux you will be
able to interpret the code in Linux, if we install it in Windows you will be
able to interpret the code there and in the same way in Mac. Both languages
would then be multiplatform.
Both languages have Open Source syntax; this means that you can see how
the codes of the standard libraries are written, so you will be able to see
more easily how each of these programs is developed.
Now let's talk about the differences between both languages. JavaScript (JS)
was a language that was born to add interactivity to the browser. In JS, the
functionality that brings the language itself is very minimal, since it lacks a
large amount of codes it is complemented with another tool called npm,
this way you will be able to download codes from the internet to
complement this functionality because JS does not include all of the tools. It
is at a disadvantage compared to Python because you are going to find a
wide variety of codes that you will have to choose which one you are going
to use. In Python, you will find certain types of applications in which it is
unique. In JS, the most popular default type of application are web
applications, web-based applications, both in the server and browser
environment.
In Python, you will find packages with specific functionality while in JS, it
is easier to find several packages that do basically the same thing.
Today, data and information are often represented by binary codes. A digital
system such as TV, telephone, and so on, stores, transfers, and processes the
information in binary code.
Our universe moves around data, scientific data, such as astronomy data,
for example, genomics, while, in social sciences, historical data, digital
books. In companies and commerce, we have corporate sales data, for
example, market transactions, censuses, airline traffic. In the area of
entertainment, we can have data from images, movies, mp3 files, games. In
medicine, we can have data from patients, scanners, and results.
In this sense, a Data Science will have as its main objective to model,
analyze, understand, visualize, and extract all the possible knowledge from
the data we have available.
The procedure is handled in several states or steps until we get to obtain the
final results. The first one is when we have to focus on the understanding
and approach of the problem we are going to evaluate. This information we
have will be fed back with the understanding of the data that are associated
with the problem we are evaluating. This whole process is done iteratively.
Then comes the stage of preparing the data we handle, thus creating our
database, and then making the models.
Finally, we arrive at the final step where we evaluate the results obtained,
and we see if the model is generated correctly with the chosen technique, if
indeed the model is adequate we will proceed with the implementation of
the model. If the opposite happens and the generated model is not the one
we are looking for, the process returns to the initial state, and we would
start the iterative process again.
Fundamentally, if you want to join the world of Data Science, you must
have knowledge mainly in modeling, visualizations, database, and
programming. In the analytics part, you handle information about statistics,
artificial intelligence, machine learning, primitive models, natural language
processing. On the other hand, you must have the knowledge of database
management, data recovery, in addition, you must manage large volumes of
data which is what we also know as Big Data, acquire knowledge in the
area of computer science, such as programming, privacy, security,
distributed system, and last but not least design art, where data,
interpretation, and models must be visualized correctly.
Machine Learning
In the case of Machine Learning, we are talking about an area whose
objective is to develop algorithms that allow computers to learn.
The algorithm learns not only from the films seen but also from the ones we
no longer see and the ones we include in our list of visualizations. All this
information serves as a database for the algorithm to learn which the most-
watched films are.
How Is ML Classified?
• Supervised Learning Algorithms: this type of learning occurs when
an algorithm learns from data from examples and associated target
responses, which may consist of numerical values or string labels, to
later predict the correct response when presented with new examples.
Examples include voice recognition, spam detection, and handwriting
recognition, among others.
• Unsupervised learning algorithms: refers to when the algorithm
learns from simple examples without any associated response, letting
the algorithm determine the data patterns itself. For example, detect
morphology in sentences, classify information, etc.
Warren McCulloch and Walter Pitts (1943), have been recognized as the
first authors of the first work of artificial intelligence. They started from
three sources: knowledge about basic philosophy and functioning of
neurons in the brain, the formal analysis of Russell's and Whiteheady's
propositional logic, and Turing's theory of computation. (Proposal made by
Alan Turing in 1950, which was designed to provide an operational and
satisfactory definition of intelligence.), they proposed a model constituted
by artificial neurons, in which each one of them was characterized by being
"activated", or "deactivated", and the "activation", was given as a response
to the stimulation produced by a sufficient quantity of neighboring neurons.
They showed, for example, that any computing function could be calculated
through any network of neurons interconnected, and that all the logical
connectors (and, or, not...) could be implemented using simple network
structures
Artificial intelligence became an industry from 1980 until the present. In
1981, when the Japanese announced the "Fifth Generation" project, a ten-
year plan to build intelligent computers. As a response, the United States
created the Microelectronics and Computer Technology Corporation
(MCC), a consortium in charge of maintaining national competitiveness in
these areas.
The AI has been given as an objective for the study and analysis of human
behavior. In this way, AI applications are mainly located in the simulation
of man's intellectual activities. By imitating through machines that are
mostly electronic, as many mental activities as possible, and perhaps be
able to improve human capabilities in these aspects.
His field is very extensive, and we can illustrate it to you with these three
points of view:
• Those who argue that it is possible to make "really thinking devices,"
a viewpoint called strong AI.
• Others who think that it is possible to simulate mental states (without
being mental states) of our brain by means of computers, a point of
view called weak AI.
• The "dualists", who give separately the dimension of body and spirit,
and in this way, there would be "truth judgments" to which computers
would never have access.
Java is one of the most important languages, an example that embraces the
fusion between Java and artificial intelligence is the system called WEKA,
which is a software platform for automatic learning and data mining written
by Java.
And last but not least, thanks to the simplicity of Python, it is considered
one of the best programming languages for artificial intelligence. Unlike
languages such as Java, C++ or Ruby, Python proves to be more effective
and allows the user to save time. An example of AI is a well-known
PyBrain, a powerful library with flexible algorithms for automatic learning.
It specifically contains algorithms for neural networks.
Python can occupy matrixes, arithmetic, objects and variables as you will
learn in the following chapters, it also has automatic memory management
which accepts a wide variety of programming paradigms, it is also available
for all operating systems.
Python has libraries such as NumPy, PyBrain and SciPy, which are used for
scientific, advanced computing and machine learning. In addition, it can use
IDE for code checking and is very useful for developers working with
different algorithms.
The AI and robotics, in this case, we are talking about devices composed of
sensors that receive input data and order the robot to perform a certain
action. When we talk about AI and robotics, it is enough to think about
chatbots, each time we recur to AI to meet the needs of customers by
computers. This is what we know as "Natural Language Generation," a sub-
discipline of AI that converts text data and allows computers to
communicate ideas with impressive accuracy.
Python has approximately six main libraries and many others, but we are
going to explain specifically which are the most important, we are going to
find Python libraries for visualizations as they are: Matplotlib, Seaborn,
Bokeh. Data visualization allows us to better understand the data we are
processing or analyzing, and these libraries are very good for handling this
type of information. Matplotlib is Python's graphical library, generating
excellent quality graphics, whether time series, histograms, power spectra,
bar charts, error charts and others. Seaborn, specializes particularly in the
visualization of statistical data, offering a high-level interface, to create
visually attractive statistical graphs with informative quality; and Bokeh,
visualizes data iteratively, in a web browser, we can also create interesting,
interactive graphs.
In Windows and Mac, IDLE will be distributed together with the Python
interpreter, meaning that when we install Python in both operating systems,
IDLE will also be installed.
Now, to begin the installation for Windows and Mac users, we must, first of
all, go to the Web site; www.python.org:
Once you have entered this page, you will be able to confirm that you are
on the official Python site. After that, in the page you will see a menu, place
the cursor on the tab that says "Download", and additionally you will get a
submenu, where you will be able to see the operating system you are using,
which in this case would be Windows or Mac. If you are a user whose
operating system is Mac, select the tab of the download submenu and you
can see all available versions to date of this programming language.
In order to know which version is the one we should choose at the time of
downloading Python, you have several recommendations, when you select
the tab of the download menu, you will observe on the right side that
appears "Download for Windows", one of the advantages that this Web
system has, is that it is able to identify the operating system in which you
are entering its page, so by default you are going to be recommended the
version that you should download for the operating system you are
working. Easy, isn't it? In case your operating system is Windows, it will
recommend you to download the latest version of Python for Windows, and
likewise for other operating systems, whether Linux, Mac, etc..
Once you have the knowledge of which version you are going to download
for Windows in this case, select the option that says Python followed by the
version number until the last verification at the time of writing the book, the
latest version of Python was 3.8.0. It should be noted that every time this
language changes version, generally only some new features are added,
which you will be able to use in your programming code.
Once you click on the Python version you selected, you will see how it
automatically downloads to your operating system. Once the download is
finished, you will look for the file in your computer, which is usually in the
download folder of your desktop, being this file the Python installer.
In the case of Windows, you click on the right button and you will run the
program, the Python programming language installer interface will open
immediately, and in this step, it is advisable to select the box that says "add
Python 3.8.0 to PATH". This recommendation is due to the fact that PATH
is an environment variable of the operating systems, where the paths in
which the interpreter must look for the programs to execute will be
specified.
When following the recommendations, you select this box, what happens is
that you will be able to use text editors and commands to be able to
program and execute our code in Python.
Finally, the system asks how you want to install the programming language
and you can choose the customized version, where the user decides what he
wants to install and how he wants to do it, or you can choose the
recommended option by default, which only consists of the path where
Python will be installed and will also tell you that it includes IDLE, which
as explained above refers to the integrated development environment.
Then when you select "install now", the installation on your operating
system begins. Once the installation is complete, you verify that the
program actually ran.
A quick way to search Python is to place the word IDLE in the Windows
search engine and we will see immediately the IDLE that, when opening,
you will be able to see that the programming language is already
downloaded and installed.
When you press the "Enter" key, in case your Linux operating system has a
security key, it will ask you for it at that moment. Once placed the key and
pressing the key "Enter" again the system will try to install Python, and in
case you already have it installed by default, what will happen is that if it is
an older version, it will be updated, instead, if you have already installed
the latest version, it will remind you that you already have Python and that
you already have the most recent version.
This way, you can be sure that you have Python in your Linux operating
system. Afterward, it is recommended to verify if IDLE is installed and to
do so you have to write the following on the command line:
By pressing the "Enter" key on the system, it will install the IDLE, in case
the operating system does not have it, or update it in case it has one, but an
older version, in the same way as in the previous case. So finally, you
already have Python installed and also the development environment
(IDLE) with which you are going to program in the Python programming
language.
To be able to open IDLE in Linux, you place IDLE on a command terminal
and the environment that is going to be used for the language will appear on
the screen.
When you have the programming language in the operating system that you
are using, either Windows, Mac or Linux, which were the three cases
explained, you start with the settings to customize your Python language if
you want, according to the liking of each user. For example, if you select
the options tab in IDLE, it will open an interface where you will be able to
modify the sizes of the letters, the color, the background of IDLE, to
mention some of the things you can do.
The source code is written in the Python programming language and this
language will later be converted into an executable file and for this to
happen, in other words, for the source code to be converted into an
executable file, the help of a compiler will be necessary that will later be
executed in a "central processing unit" (CPU) and all this will happen with
the help of an interpreter.
In summary, we have that a compiler is going to convert our source code
into an executable file since it is a translator that transforms a program or
source code into a machine language so that it can be executed; this
translation process is what is known as compiling.
When we open the IDLE in our system, in the same way that we did it
before, we are going to observe the screen that we find when we open our
IDLE, which is called Shell, or we can also call it as the interpreter of our
Python language.
Every time we open our interpreter or Shell, we will always find a kind of
header, which will always be the same, where it has Python information,
such as the version in which it is working, date and time, for example. This
type of format helps us appreciate that we are working with the Shell
interpreter.
We write a line of codes in Python, starting with the very famous phrase in
Python for every beginner "Hello World" and we will do it in the following
way:
The syntax is written as follows:
Additional detail of the interpreter is that it can also be used from the
command prompt, which is also available on Windows, Linux and Mac.
In order to use the interpreter from the command prompt, simply type in the
word Python and press the "Enter" key. This way, you start to run the
Python interpreter and we know that we are effectively in the interpreter
because, as in the previous case, we are going to see the same header as we
saw before.
In the Python language, the names of the variables cannot coincide with the
names of the commands assigned to this programming language; besides,
the variables cannot contain blank spaces. What all this means is that a
variable cannot be called "print", for example, because "print" is already a
command that makes an assignment in Python. Nor could it be called "na
me", because as mentioned above, it has blank spaces in the phrase.
There are two types of variables that are common in Python and they are as
follows:
The variables that store numbers, in this type of variables, are subdivided
into two types:
In addition, all variables that store text are called "strings" (str), so the
content that is placed within these variables must be in quotes.
How Can We Declare a Variable?
The first thing that is done to be able to declare a variable is to assign a
name to it, when doing so a memory space better known as variable is
created immediately, then we place the equal sign, after the variable name
so that we can indicate to the program that is going to store a data that will
be in the memory space or variable and finally the data that will be saved is
assigned, in this way and following with all these indications it will be
possible to store this data to the variable.
One of the advantages of Python is that it can identify the type of data to be
stored, which means that we do not have to tell the program that we are
storing an integer, as is usually done in C, because the language itself
detects it.
For example,
number_one = 2
In the case that we store a decimal or real number, what Python does, in this
case, is to tell the variable that it is going to receive a real value and
therefore it must meet the characteristics in order to receive it. The same
happens with the values of type text or "strings"; Python can identify the
type of data to be stored.
Data Types
The Python programming language has several types of data, numeric, text,
where a single value can be True or False and some others. When we store
data in a variable, the result obtained will depend specifically on the type of
data we are working with.
Therefore, we can say, that the types of data are those which allow us to
order a type of information, so that later on we can store in the variables,
this data, and then, by means of the use of the variables, to carry out the
operations that the programmer requires.
It is advisable to know what type of data we are using, to enhance the scope
of the code or know how far we can go because if we do not know what
type of operations we can do with a type of data, we would often get errors
when programming.
Numbers: This type of data is the one that allows us to perform all kinds of
arithmetic operation, from adding, to compare two numbers. They are
absolutely necessary to make functional programs because, as you know,
there are mathematics everywhere, and programming is not an exception.
Among the types of data that make up the numbers, we have:
3) Complexes: This type of data is used, because they have a real and
an imaginary part, in real life the imaginary numbers are very used
in engineering, specifically in electronics, for the calculation of
factors and phases. To be able to declare a variable that contains the
type of integer data, the following syntax will be needed, which
will be different from the one we have been seeing.
Strings: This type of data is extremely used since it is the one that allows
sending messages because it allows us to assemble a cumulus of characters
and to send it, it is for that reason that also this type of data, can be
denominated string of characters.
The syntax to declare the variables that have within them the data type of
character string is very simple, as you only need to write the string between
single quotes or double-quotes. To get a better understanding of this
concept, let's look at the following example:
This type of variable, which has strings inside them, is used a lot to add
interactivity to the program; it is used both in the input function and the
output, which will be explained later.
As we can see here, two strings are created, and then concatenated into
concat, and the next case is that the variable string3, is the concatenation of
four strings.
Boolean: This type of data is used to perform logical operations, which are
widely used in programming, the same may be True or False, and from this,
different programs can be made, this type of data will take importance in
the following chapters, where it is necessary to use the conditions.
Lists: This type of data is very special, better said, this data structure, since
they allow us to group several items inside them, the same ones are
extremely useful, because they are very used inside the programming.
As we can see, in the last example, we created two lists; the first one has
three items, the first one is an integer, the second one a string and the third
one, a boolean. While the second is a list that has within it, eight items,
which are integers and go from the one to the eight.
But this type of data does not end here, as you can perform different types
of operations with the same, such as inserting new items at the beginning of
the list, insert them at the end of the list, also remove elements from the list
or even look for some specific element within the lists.
Tuples: This is also a data structure, which works in a similar way to the
lists, but in this case, the tuples, analogous to the lists, organizes a number
of items, but unlike the lists, after created the items, can not be modified;
therefore, they remain so until the end of the program. One of the benefits
of using tuples is that you work faster with them, in addition, that these will
not be modified; therefore, it is not possible to make any data error.
As we saw, to create a tuple, you only need to put the items in parentheses,
unlike the lists that are in brackets. In an analogous way with the lists, you
can do the searches on the tuples with the sentence in.
Dictionaries: As we have seen with the last two data structures, here we
also find a structure; in this case, each word has a key, working like a
dictionary, but to understand this a little better, it will be important to see a
simple example.
Operators
Operators that can be used in Python can be classified into five categories:
1. Arithmetic Operators
These are operators that have the ability to perform mathematical or
arithmetic operations that are going to be fundamental or widely used in
this programming language, and these operators are in turn subdivided into:
1.1. Sum Operator: its symbol is (+), and its function is to add the values of
numerical data. Its syntax is written as follows:
>>> 6 + 4
10
1.2 Subtract Operator: its symbol is the (-), and its function is to subtract the
values of numerical data types. Its syntax can be written like this:
>>> 4 – 3
1
1.3 Multiplication Operator: Its symbol is (*), and its function are to
multiply the values of numerical data types.
Its syntax can be written like this:
>>> 3 * 2
6
1.4 Division Operator: Its symbol is (/); the result offered by this operator is
a real number. Its syntax is written like this:
>>> 3.5 / 2
1.75
1.5 Module Operator: its symbol is (%); its function is to return the rest of
the division between the two operators. In the following example, we have
that division 8 is made between 5 that is equal to 1 with 3 of rest, the reason
why its module will be 3.
Its syntax is written like this:
>>> 8 % 5
3
1.6 Exponent Operator: its symbol is (**), and its function is to calculate
the exponent between numerical data type values. Its syntax is written like
this:
>>> 3 ** 2
9
1.7 Whole Division Operator: its symbol is (//); in this case, the result it
returns is only the whole part.
Its syntax is written like this:
>>> 3,5 // 2
1.0
However, if integer operators are used, the Python language will determine
that it wants the result variable to be an integer as well, this way you would
have the following:
>>> 3 / 2
>>> 3 // 2
If we want to obtain decimals in this particular case, one option is to make
one of our numbers real. For example:
>>> 3.0 / 2
2. Comparison Operators
The comparison operators are those that will be used to compare values and
return; as a result, the True or False response as the case may be, as a result
of the condition applied.
2.4 Operator Less than: its symbol is ( < ); its function is to determine if the
left value is less than the right one, and if so, it gives True result. For
example:
3 < 5 Is True
8 < 3 Is False
2.5 Operator Greater than or Equal to: its symbol is ( > = ), its function is to
determine that the value on the left is greater than the value on the right, if
so the result returned is True. For example:
8 > = 1 Is True
8 > = 8 Is True
3 > = 8 It's False
2.6 Operator Less than or Equal to: its symbol is ( < = ), its function is to
evaluate that the value on its left is less than the one on the right, if so the
result returned is True. For example:
8 < = 10 Is True
8 < = 8 Is True
10 < = 8 Is False
3. Logical Operators: Logical operators are the and, or, not. Their main
function is to check if two or more operators are true or false, and as a
result, returns a True or False. It is very common that this type of operator
is used in conditionals to return a boolean by comparing several elements.
Making a parenthesis in operators, we have that the storage of true and false
values in Python are of the bool type, and was named thus by the British
mathematician George Boole, who created the Boolean algebra. There are
only two True and False Boolean values, and it is important to capitalize
them because, in lower cases, they are not Boolean but simple phrases.
X > 0 and x < 8, this will be true if indeed x is greater than zero and less
than 8.
In the case of or, we have the following example:
N % 6 = = 0 or n % 8 = = 0
It will be true if any of the conditions is indeed true, that is, if n is a number
divisible by 6 or by 8.
In the case of the logical operator not, what happens is that it denies a
Boolean expression, so, if we have, for example:
not ( x < y ) will be true if x < y is false, that is, if x is greater than y.
4. Assignment Operators
Assignment operators basically assign a value to a variable, using the
operator ( = ).
4.1 Operator Equal to =
This equal to the operator will assign to the variable on the left side any
variable or result on the right side.
The general syntax of this type of assignment is:
>>> a = 5
>>> a1 = a
So we can say that a1 equals 5
In this example, we see that the variable that was initialized, var1, is equal
to the entry that the user enters, it should be a number, since the function
specifies the user to enter a number, the same entry, is going to become a
string, because the function input(), always return a string, and from there,
you can make the calculations.
Escape characters: These are some types of character combinations, which
behave differently within the strings, as they allow us to do things we can
not do easily, such as a line break.
\\ \
\’ ‘
\’’ “
\a Sound
\b ASCII regression
\f Page advance
\n Line break
\r Carry Return
\t Horizontal Tabulation
\v Vertical Tabulation
\ooo Octal value character
\xhh Hexadecimal value character
Triple Quotes: They are used to place multiline character strings, this can
be done with single triple quotes '''text''', or triple double quotes ""''text"""
an example of its use is as follows:
In this example, we created the two variables, string and string2, we used
both the triple single quotes and the triple-double quotes, in which we
placed several line breaks, without the need to make escape characters, in
this case without the \n. Finally, we will print on the screen the
concatenation of the variable string and string2.
Chapter 3: Conditionals
For this type of case, we have the sentences called "if", "else" and "elif".
Conditionals such as if, else and elif in Python are mostly used to execute an
instruction in the case of certain conditions, in which one or more are met.
We can see the conditional as the moment in which the decisions to take in
our program are presented; depending on them, the program can be
executed or not.
If Statement
This statement is responsible for evaluating a logical operation, which can
give a result of the type "True" or "False" and then executes a certain piece
of code as long as its result is true.
Now, how can we see this? Well, this statement is very useful when
programming. Between these statements and loops, we can cover a large
part of the codes that exist today, so it is of the highest importance to
understand the if. Imagine the hypothetical case that you are presenting an
admission exam to a university, and there is a program designed to enter the
grades of all those who have presented the exam and to indicate whether it
is admitted or not. If the grade is greater or equal to the expected value to
enter the university, the same program will be responsible for placing the
student in the database as a new entry to the university, but in the opposite
case, nothing will be done and move to the next.
We observe that a simple evaluation of two variables "x" and "y" is made,
the condition of the program will be that, if the condition is fulfilled, the
program prints us the text; otherwise it does nothing.
The first thing we can observe in this example is that two variables were
initialized, both h and b, which are input type, one is related to the height of
the rectangle, and the other is related to the width of the base, respectively.
Already, at the moment of arriving at the condition of the if, we take into
account two events that are related and to occur simultaneously, in order to
be able to proceed to make the mathematical calculations; they are: first that
h is greater than zero, and also, that b is greater than zero, Why that? Well,
everything has to happen simultaneously, because if not, three things can
happen:
- h<0 , which indicates that a negative area may arise.
- b<0 , which indicates that we will also have a negative area.
- b<0 and h<0, and even if we find an area greater than zero, it is not good
to say that a measure of length is less than zero.
When observing this, the only valid condition is the one mentioned recently,
which is that the two length measurements are greater than zero
simultaneously. And at the moment that this condition is True, we proceed
to enter into the if block, within which, we proceed to calculate the area of
the rectangle, then convert the variable into a string, and finally print on the
screen the value of the area of the rectangle.
Finally, "Bye" was printed on the screen in order to know that the program
has been successfully completed.
Now, we can also make another example, which is very useful at the
moment of dividing, since it is known that the division between zero is not
defined; therefore, we have to force that the denominator is different from
zero, as we can see next.
The first thing we can see in this example is that we declare two variables:
the first is the one we call d, the same is related to the denominator of the
division, on the other hand, the variable n, is from the numerator of the
division; it is clearly seen that a precondition of the program is that a
number must be entered, any, logically, the denominator different from
zero, but does not have any other restriction. A curious thing that we can
find in the code is that it uses the function int(), and then, within it, the
function input, and...Why? Well, because as you should know, the variable
input, returns a string, which depends on the input the user wants, therefore,
what we proceed to do is convert the string into an integer in a more
compact way in code.
Then, the condition that must be met is that the denominator should be
different from zero, or specifically in code, that the variable d, should be
different from zero. In the case that this condition is fulfilled, the obtained
result will be printed on the screen, but you may observe that the string
"Result:" is concatenated with the string resulting from the division between
n and d.
Finally, it is printed on the screen, to show that the program has finished in
the correct way and there is nothing to worry about.
Else Statement
This statement could be seen as a plug-in to the if statement since it
provides you with other code alternatives when executing a program if its
evaluated expression is of the "False" type.
Then we can say that this sentence is very necessary, since it is the case in
which a condition is not met and you want to perform an action because of
that, because as we saw in the example above, specifically in the areas or
the division, something more is needed, that can be intuited, because it is
true, no error was made, but it needs something to tell the user that he
entered some wrong value. This is one of the reasons why an else is
necessary, but not only that, we can also take this to another level; Imagine
that you are programming the communication of a fuel plant and in the
hypothesis that the condition is that if there is no spill, a green led will turn
on, but in the case that this is not true, and the else does not exist, then it
would be a real disaster; therefore, in that case, we introduce a else to notify
customers that a problem is occurring in the plant.
We cannot leave aside the other previous examples of the if because, in this
case, they are also important. We will focus first, in the example of the area
of the rectangle:
In this example, we can observe that, in an analogous way as it is done in
the example of the if that was related to the area of a rectangle, two
variables are initialized, h and b, these variables need that we enter the
value of the height and base, then, these variables will be transformed to
integers by means of the function int().
Now in the case that the condition has not been satisfied, it will be shown in
the screen a string that will say "Error", in order to make it clear that an
error has happened in the program.
Finally, to finish the program, a print is made, which will inform us that the
program has finished and that will print a "Bye" on the screen.
On the other hand, if we get to make the example of the division, this would
be very similar to what we have just done, since the else would also make a
print of "Error" and also the condition of the if, will be equal to what was
done in that example, because what has to be met that the denominator will
be different from zero.
Elif Statement
This will be a combination of the above cases. It is used to link a variety of
"else if" without having to increase their tabulations.
If you have some kind of experience programming, for example in "C", you
will be waiting for the use of switch, but in Python, such a sentence does
not exist, for it you get the elif sentence, because with it you can see several
cases and study different possibilities that can occur with certain conditions.
For that reason, it is important to use it, and if you are initializing from zero
in the programming, you will be able to see gradually the importance of the
same one.
As we did with the case of else, let's do the examples of the calculation of
areas and divisions, to continue seeing the power that has this sentence.
In this case, we will do the code example to calculate the area of the
rectangle. For it, we will take into account more conditions.
You have tried, in the previous codes, when you were asked to enter a
number. Have you entered a letter instead of a number? If you have done it,
then you will realize that an error will be thrown; therefore, you will not be
able to finish the program in a correct way for it; we show you this example
to see how we can face these problems.
The first thing we do is to declare the variables, both h and b, for it, we use
the function input(). Later, we enter in the conditional part, and the first
thing is that we verify that what the user enters, is a number, for it we will
obtain True, in the case that h is numerical, then we deny this condition.
Why? Well, since if h or b is numerical we will enter directly to those
blocks and it will print us in screen that this entered value is not numerical;
for that reason, we proceed to deny it, since we will only enter to
communicate that there was an error of data; in the case that the variables
are not numerical, therefore, the above mentioned result will be denied and
a True will be obtained to enter to the errors, this is done, so much for b as
for h.
Finally, it will print on the screen the string "Bye" to tell the user that the
code ran satisfactorily.
But you may wonder “why, in this case, the comparison was not made if the
number was negative or not?” This was not done, because the isnumeric
function will only return True for natural numbers, by this we mean from
zero onwards.
Then, with the other example, the one of the division, we can say that the
process would be very similar, with the difference that if it would be
necessary to make the comparison that the denominator has to be different
from zero, but by the other conditions we can say that the code is similar.
As has been done throughout the book, this type of statement can be used in
different cases, which can be from hobbies at homes to industrial protocols.
A practical example can be where a robot is going to move because,
depending on the conditions it has, it will move to a different place.
Chapter 4: Loops
Loops are extremely important for different reasons and they can be used
for many applications, such as programming a counter, which can then
function as a stopwatch, to program a code, which repeats the same
instruction several times, until the user enters the correct data.
While Loop
The while loop is the one that allows us to execute periodic sequences,
which in turn allow us to do multiple sequences in a row. This will be
executed as long as its internal condition is of the "True" type, and if the
opposite occurs, that is, that the condition is of the "False" type, the cycle
will finish and will continue with the normal execution of the program.
This type of loop is widely used, since it has different utilities, from those
that are controlled by counting to those that can become infinite.
The syntax of While is the following:
This example is very simple; we declare a variable with the name "x"
whose value will be 0, we establish a condition where the loop will be
executed as long as x is less than 14. If this is executed, the program
proceeds with the execution and prints the result.
There is a variety of while loop types; in this chapter, we are going to talk
about while loops controlled by counting and while infinite loop.
Loop "while" controlled by counting: This type of loop is going to have a
counter, which will repeat the process as many times as we have indicated.
Example:
Loop "while" infinite: In this type of loop, when a case arises in which we
have a number of indeterminate instructions, we will only have that our
condition is of the while(True) type, and this will allow the program to work
properly.
In this example, we import the time module, although we have not yet
explained the use of modules, we help ourselves from it.
Then we initialize the variable x, that possesses the value of one; later, we
enter in an infinite cycle, as the condition is True, it always repeats itself,
we print in screen the value of x, then we increase in a unit the value of x
and finally, with the help of the module time, we proceed to use a delay of a
second, to thus wait for a small-time, specifically a second, until the cycle
repeats itself. As you can imagine, in this example, a simple chronometer
was programmed.
In line number three, we can find an if statement, which will give the
condition that if the zero value is entered, this cycle is broken. And if not,
the program will print on the screen "you did not insert 0, so I can break up
the loop".
Finally, if the condition is satisfied, the program will exit the cycle and print
Finish.
We can observe that the break statement is very useful to end a cycle as the
example we just saw; there are other types of cases in which it could also be
very useful. This break statement is considered a very essential tool when
programming.
How do I know in which case to use a break type statement? You might be
thinking that they are used mostly for infinite cycles because it would make
a lot of sense to break it, but in the program, we can get any kind of
exception and this sentence would help us to break some unwanted error.
In this example we will have the case of a while cycle, in which the values
that are multiples of 2 will be omitted, if these values are multiples, this will
not print anything on the screen but will jump to the next iteration.
We begin creating a variable with the name cycles, which will be input, this
will ask the user to enter the number of cycles to be performed in the
program, and as is already known, the cycles are a string data.
Followed by this, we create the variable of integer type count; this variable
is going to have the function of a counter; this way, we will have knowledge
of the cycle in which we are.
Now we are going to create a while loop; this is going to indicate that the
count variable is less than the cycle variable.
What does count -1 mean? Well, this is just to make sure that the program
can do what we want and that our code works optimally. It is always
recommended that every time we make a program of this type, we subtract
a unit from it so that we can fulfill the number of cycles that correspond to
our code.
Next, in the following line, we are going to declare a variable x, which will
have the same value as the variable count since this will help us to know the
current value that the cycle will have; in addition, this will increase one
unit. (+=1)
Once this is done, we will go into the block of conditionals, which in this
case will be that our variable "a"% == 0. What this means is nothing more
that limiting the rest obtained from the division between a and 2, must
necessarily be equal to zero so that this way, the if statement is executed.
The instructions of this statement are simple: first print on the screen the
error that has occurred; followed by this, we will use the continue
statement.
In a hypothetical case that the established condition is of the False type, the
instruction found in the else block will be executed.
Once we have finished with our conditional blocks, the number of cycles
will be printed on the screen in order to know the position of our program.
Finally, our program will print a message indicating that the program has
finished.
The pass statement can not affect in any way the behavior of our code and
can be used anywhere on it, either for a cycle or even a function.
Observing the last example, we can understand in a clearer way what this
statement is about.
First, we have declared the variable cycles, and this variable is going to
determine the cycles that the program will have when we order it, it is an
input variable, as we already know, it is of the string type.
Next, we are going to create a counter (count); this one will have the
function of indicating us in which position of the cycle we are.
Once this is done, we can start working with our cycle, that is the while, and
the only function it must perform is that: count-1 must be less strict than the
function int(cycles). Then we are going to declare the variable that is going
to assign the value to count and this value will be increased by one unit so
that in this way our program makes sense.
Now we must take care of the conditional, with the % operator our program
will be able to verify if x meets the condition of being a multiple of 5, it is
very important to take this into consideration and depending on this the
result that returns will give a correct functionality to our program and lets
remember that we have established that this must start in less than zero.
In the case that the established condition is satisfied, the program will print
what we have indicated, and it will continue, with the pass sentence, we
will add a delay to the program. If the condition were to be of the False
type, this will automatically enter the else block, and it will only print out
that the number entered is not a multiple of 5.
Once we have finished the conditional block, the program will not take into
consideration if the condition is true or false, but will automatically print
the current position of the cycle in which we are through print(x).
The program, at the end of the while cycle, will print us a string; in this
case, we place an end to indicate that the program has finished correctly.
With this example, we can see that adding a pass sentence is nothing more
than adding a delay to our program.
For Loops
Cycle For: This is the other loop that we can get in Python, and this is
responsible for repeating the code blocks a certain number of times, so they
are extremely useful. Even on certain occasions can be supplanted a while
loop controlled by counting, by a simple for.
One of the benefits of using the for is that it is not necessary to create a
variable previously to enter in the for, the same will iterate within an
element in which it can iterate, such as a list, a string or a range. Another
noteworthy aspect is that it is not necessary to codify a sentence that
increases the counter each time iteration ends.
To understand a little more this type of loop, we will proceed to see the
following example:
As we can see in this example, the count variable is created, which is an
integer, depending on the user's input. Later, we will create a for, which will
be iterating from i to the counter range. But what does the range mean? The
range returns a list that has a certain number of items, depending on the
count, in this case; therefore, the variable i, will iterate as many times as
count specifies. Finally, it will print on the screen the number of the cycle in
which we are.
A very useful example of the use of the for is one in which we move over a
list, as we can see below:
In this example, we can see that we create a list, which has the names of
different countries such as USA, Venezuela, Italy, Spain and Germany, then
we enter the for cycle, which will have a variable that will iterate within the
list countries; this variable will be country. Finally, in each iteration, an item
will be printed within the list, which means that the name of a country will
be printed.
Chapter 5: Functions
• The code that goes with the function will always start after the “:”
character.
• It is really important to identify correctly our code and be very careful
with its indentation (four spaces).
• The parameters that go with the function must be defined inside the
parenthesis of the same.
Once created, how can I make use of it? In order to be able to call or use a
function, all we have to do is to declare it at the beginning of our code. This
is really important since if this is not done, the Python interpreter won’t be
able to identify it, so the function will not exist and will not work.
Parameters
We call parameters to all those values that are going to be used in a
function; they will work as inputs. Functions can receive one or more
parameters and they just have to be written correctly and separated by a
comma, in order to be invoked.
For example:
With this, you could see how easy it is to define the beginning of a function.
It is really, REALLY, important the use of the def word, since this is the one
that will tell the Python interpreter that a function is being created.
As we already know, the parameters are the values that the function is going
to receive when it is defined. These values are going to be called arguments,
and are classified as follows:
• Argument by position: When we send an argument to a function, this
one receives it in an ordinated way, which was defined previously.
In this example, you can see that a function named “Hello” was created; its
parameters are name and age. We have created this function to print a
message on the screen saying the name and age that the user writes.
We also defined the variables x and y, which are strings that are related to
the name and the age. It is important to do this because otherwise, the
function will not be able to run.
Finally, we called the function “Hello”, and we have put the arguments on a
specific order so that the message that will be print on the screen makes
sense. If we had written “Hello (y,x)”, the message print on screen would
have been like this:
“Hello 16, I am Matt years old”.
Later, we created the variable c, which will have the value of the return. To
call this function, all we have to do is to write its name next to the values
that it will have inside the parenthesis.
As you could see, the function number() has no input parameters, hence,
what it will do, is to simply print the message number().
Return Statement
Until this point of the book, you may have noticed that Python’s functions,
on its majority, have return values, which can be explicit or implicit. We
also know that the return statement is a keyword of Python; its purpose is to
end the execution of a function in order to obtain a result value of it.
Lambda Function
We define a lambda function as a special function, this one is part of the
default functions of the Python programming language, and it is an
exclusive function.
The lambda function has a special syntax and that is the reason why we
consider it an exclusive function since it allows us to create any kind of
function on a really fast way, without having to save it.
Its syntax is really easy and simple, you only have to write the reserved
word lambda, the arguments and finally the “:” character, like this:
You can easily see the use of the lambda function. We have a sum function
that we equaled to lambda, who is going to have two parameters, a and b,
which are going to be added. Additionally, we have the x parameter, which
will store the respective values of a and b in order to do the add.
The lambda function is used mostly to call functions that are needed in a
program for a really short time, since, as it would be used so quickly and so
little, naming it is not necessary. This function is commonly used with the
filter() and map() functions.
Filter() Function
We define the filter() function as the one that has the task of filtering
arguments, whether they are lists or iterators. As a result, this will always
return an iterable with the elements that have already been filtered.
In this example, we will show you how it works. This will filter the people
that will be able to enter a local for +18 people. We will define then, the
filter() function and the iterable.
So, you could see how we created a list of the different ages of the people
that want to go to the local. Then, we created the function passfilt, which
will filter the data that is given, returning whether True or False. The next
sentence will create a list with the people that approved the filter.
Lastly, we created a for loop, which will show us on the screen the obtained
results, telling the user which ages were allowed.
Map() Function
This function will execute elements on a tuple or a list, in order to be able to
return elements on the sequence, as the result of the operations. This
function is used mostly to simplify the code and make it simpler, avoiding
the creation of unnecessary loops or other resources.
In this example, you could observe that, at first, we defined the sum()
function, which returns a+b. Then, we created the list1 and list2 lists, which
has integers on it, put there as items. Later we created the result variable,
which will have the value of the result of the map function. That value will
be the addition of the items one by one of both lists. Then we used the list()
function to convert that result into a list.
The second example, on the same image, was pretty much like the first one.
The only difference is that instead of using integers, we used strings, and
instead of adding, the function concatenates those strings. Finally, we used
again the list() function in order to get a list as a result.
Sometimes, to get somewhere, we will find two different ways, one shorter
and simpler and the other longer and harder; that is what happens here.
Lambda would be the easy way since it allows us to use functions easily on
our code.
Despite being the lambda function better for saving lines of code,
sometimes the def statement is easier to understand for those users that are
getting started into programmers and even for those that already have some
experience but not that much.
Global Variables
These variables are really important since they will always be the same, no
matter where on the code they are located. They will always have the same
value and we will be able to use them all through the code.
A recommendation we can say about its use is that you must be really
careful because you can make mistakes very easily.
As you can see in the example, we declared the variable vari as a global
variable, hence we can access them anywhere in the code.
Then, we defined the function sum, who will have as a parameter a and will
return the add of the global variable vari plus the parameter entered on the
function.
At last, vari will be equal to the sum of vari plus seven, in this case, vari
would be seven, then vari would be equal to the of vari plus fifteen, and it
will be equal to twenty-two.
Local Variables
This is another type of variable, which is very useful because they only
exist inside a piece of a block of the program. After the block is ended, the
variable is deleted. This type of variable is good and recommended when it
comes to don’t wasting memory. Here is an example:
You can see that here, we defined the sum() function, which has two
parameters, a and b. The next thing that was done was to create the result
local variable that is equal to the sum of the parameters. Later, the local
variable f is created, who is equal to the string “HI”, and lastly, we ask the
program to return the result variable.
After creating the function, we say that the variable a would be equal to the
result of sum (1,3); then a would be equal to 4.
In other words, we can define a module as a simple file with a .py extension
that is capable of defining functions, variables, classes and also include
executable code.
Another advantage of using modules is that they let us reuse the code, using
data services and linking individual files to broaden our program.
The main reason why we think that the modules are a very useful tool when
it comes to programming is that they are really helpful to organize and
reuse our code. This is very important when we talk about OOP (Object-
Oriented Programming) since on that mode, the modularization and reusage
are very popular. Since Python is a programming language oriented for that,
it comes very user-friendly.
Python, on its default library, has a big amount of modules, we can observe
them on the official manual. You can find it on the following link:
http://docs.python.org/modindex.html.
In case we want to create a module of our own, you will have to do the
following. We will make a program on which we will create a module that
could be used later.
As you could see, the syntax is really simple, since it is pretty much like
creating a function. After we created it, we must be able to import it from
another program, in order to do that, we will use the import statement.
Import Statement
A module is able to contain definitions of a function and even statements,
which can be executable. With this, it is possible to initialize a module since
they execute only when our module is on the import statement.
Modules are capable of importing other modules, that is why people use to
put the import type statements at the beginning of each since with the
names of our imported modules, they will locate on a space named global;
function that modules have for importing.
With the help of the last example, we can manage to import the module
created previously and use the functions that we defined there.
As you see in this example, we created the op variable, who takes the task
of storing a string, which will specify the option that the users choose.
Then, two variables would be initialized, a and b; they will store the value
of the operators we are going to use to perform the mathematical
operations.
Afterward, the result variable will store the value that the function
calculator returns, according to the operators and the type of operation that
the users want. The function calculator comes from the module that we
have imported.
When the Python interpreter finds the import statement, it imports the
module, as long as it is located on the full search path. The search path is
nothing but a list where all the directories that Python accesses before
importing any module are located.
How to Import a Module?
For being able to import a module, we just have to follow some instructions
and steps that are performed at the moment of the execution:
We look for the module through the module search path, compile to byte
code, and lastly, we execute the byte-code of our module to then build an
object that defines it.
Namespaces in Modules
As you know, modules are files. Python creates a module object in which
all the names that we assigned in that module-file will be contained. What
does that mean? This means that namespaces are just places where all the
names that later become attributes are created.
As we can see in this example, we created the PC class, the brand is Dell,
has 4 Gb of ram, 512 Gb of ROM and an intel i5 processor; this is the way
we can create a class that only has attributes, but to make use of this class,
we need to create an object, which will be explained below.
Object characteristics:
• Object: An object is an instance of a class, this entity is the result of a
set of properties, attributes, behavior or functionalities in which they
react to events that occur in the program.
As we could see in this example, specifically in the past one, we created the
class, as we did previously, then we created our first object, with the myPC
name, which is of the PC type. To verify that the ram of our computer is
four gigabytes of ram, we use the dot property.
This property is used to access the attributes of our objects, and this is done
by placing a dot, as previously seen, and then write some attribute of the
object.
But this sounds a little repetitive since it seems that all computers are the
same, but Python had already thought about this, for it, there are
constructors so that we can make objects as the user wants and not in a
predetermined way. In the following example, we are going to place
constructors and methods in the same example, with the objective of
making a complete example so that it is clear to the readers:
In this example, we can see how to create a class, since it has its respective
constructor, which uses the reserved word self, which is used to access an
attribute from any method without any inconvenience, also to observe how
the other modules were created, you could realize that they have been
created using as arguments the word self, no matter what method it is, that
word should always be there, and also, every time you want to access an
attribute within the class, it is necessary to use the word self, you could also
see how we added the behavior of turning the computer on and off.
The next act was to create an object called myNewPC, which is class PC,
and also has the characteristics, whose brand is Acer, has 8 Gb of ram, 1 Tb
of ROM and has an AMD processor. The next act was to turn on the pc,
using the turnOn() method, using the methodology of the dot, and finally,
we want to know which processor has our new computer, also using the
nomenclature of the dot.
Encapsulation: This is based on bringing together all the elements that are
considered of the same essence that contains the same level of abstraction
in order to make a better design of the structure of the components of the
system.
Now, in the case that an object inherits more than one class, this object has
greater complexity, being a very specific instance.
Creating a class daughter: Already knowing the theory, we can see the
following example, where we create a cell class because as we all know all
modern cell phones are computers, but not all computers are cell phones, so
let's see the following example:
It also creates a call() method for the phone to call, and this method will
show a message which will say that the phone is calling.
To instantiate a daughter class, what we do is to write a variable, which will
be cellphone class, we insert the different arguments, such as Huawei brand,
ARM processor, among other features. Then we will call the call() method
so that our cellphone calls and finally we want to visualize which is the
brand of our phone, which is Huawei
Chapter 8: File management
When we work with files, these are manipulated in the following way: first,
they are opened, we operate on them, and finally, we close them.
We can define Python files as a set of bytes that will contain a certain
composite structure.
File header: These are the data that the file will contain (name, size, type)
of the file we are going to work with.
File Data: This will be the body of the file and will have some content
written by the programmer.
End of file: This sentence is the one that will indicate that the file has
reached its end.
Open() Function
To open a file in Python, we will use the open() function, and this will take
charge of receiving the name of the file and the way in which the file will
be opened according to its parameters. If you don't enter a file opening
mode, it will automatically open with a default mode, a read file.
It is important to note that the operations to open files are limited, it is not
possible to open a read file when it was opened for its writing, and we
cannot write in a file which was only opened for reading.
There are two types of files in Python, one of them are text files and the
other are plain files, so we must take into account when specifying which
format the file will be opened to avoid any confusion and error in our code.
File: This argument will provide us with the name of the file we are going
to access through the open() function, this is what we will call the path of
our file.
Mode: These will be the modes with which we are going to access and
these are responsible for defining the way in which our file is going to be
opened either for reading, writing, edition.
w: This is the write mode, which will be responsible for opening a file only
for writing; this mode will create a new file in case the file we are working
with does not exist.
w+: This is still the same write mode, but its difference is that it has a plus,
which also allows reading the file.
wb: This is still the same write mode, but its difference is that it opens the
write file in a binary format.
wb+: This is still the same write mode, but its difference is that this file is
opened in a binary format and also allows it to be a read file.
r: This is the read mode, which comes by default when opening any type of
file. This mode will open the file for reading only.
r+: This is still the same read mode, but it has a plus, which allows it to
open a file for reading and writing.
rb: This is the same read mode, which will open a read file in binary format
only.
a: This is the add mode, which is in charge of opening a file to be added.
This file starts being written from the end of the file itself and also takes
charge of creating a new one in case the file we are working with does not
exist.
ab: This mode is still the same as add, but opens in binary format. And it is
in charge of creating a new file in case the file we are working with does
not exist.
a+: This is still the same add mode, but its difference is that the same file
also allows the reading of the file.
ab+: This mode is the same as add, but its difference is that besides being
open in binary, it also allows the reading of the file.
Read([ ]) Method
Readlines() Method
Readlines method is responsible for reading only one line of our file; this
way, he will be able to return in form of a string the bytes he has read. It is
important to mention that this method is not capable of reading more than
one line of code, even if the number “n” of bytes is higher than the bytes on
that line.
There are many types of attributes that we will face when opening our files
and those help us to know more about the files.
File.name: This attribute returns the name of the file which we are going to
work with
File.mode: This attribute returns the ways of access with which we have
opened the file
Close() Function
The close() function is the one through which we delete any information
that has been written or stored on the memory of our program; this way, we
can proceed to close a file properly. Even though there are other ways for
closing files, like reassigning an object from a file to another, this function
is the most common.
What Is a Buffer?
A buffer is like a temporary file that will contain a fragment of the data
(That composes the files of our operating system) on the ram memory.
Usually, we make use of buffers when we are working with a file whose
storage size is unknown.
If the size of our RAM memory is less than the size of the file that our
program will occupy, the processing unit won’t be able to execute the
program properly.
It is really important to know the size of the buffer since it will indicate the
storage size available at the moment of using our file.
Errors
When working with files, we will have an optional string. That string will
specify the way about how we will handle the errors of coding that may
arise in our program.
Ignore_errors()= This control statement will ignore the comments that have
a wrong format.
Stric_errors()= This control statement will generate a subclass or an
UnicodeError error type in case that there is any kind of fail, mistake or
error at the code of the file we are working with.
Encoding
Now we'll talk about string encoding, which we often use when we're
working with data storage. But what are data storages? This is just to say
that they are the representation in characters of the coding; your system is
based on bits and bytes in one common character.
Newline
When we talk about the newline mode we refer to the mode that controls
the functionalities of creating a new line, these can be: '\r', " ", none,'\n', and
'\r\n'.
Newline statements are universal, and newlines are universal and can be
seen as a way of interpreting the text sequences of our code.
xlsx Files:
These files are those that allow us to work in spreadsheets as if we were
working in a windows Excel program; if our operating system is windows,
these files will have a much smaller weight to a file of type xlsx in another
operating system.
This type of file is very useful when we work with databases, numerical
calculations, graphics and any other type of automation.
To start working with this type of file, we will have to install the necessary
library and this is done through the command "pip3 install openpyxl" in the
Python terminal.
Once our command has been executed, the openpyxl module will be
installed in our Python file.
In this example we can see that we have created our file by importing the
workbook function, which belongs to the openpyxl module, then we have
added our parameters such as: "wb" assigning the workbook function and
declaring that it will be our working document, then we add the name and
save the file.
Here we can see that the main function of append() is to admit iterable data
such as tuples.
To read an xlsx file, we will only need to import the load_workbook class
and know the name of the file we are going to work with. It is also very
important that the files are in the same folder in which the program is
stored; otherwise, an automatic error will be generated.
Once this is done, we will specify the object to work with, and we will ask
for the information we need to read in order to finally print and compile it.
PDF files can be opened from any device (computer, tablet, smartphone),
and their weight is much lower compared to the weight of a common txt
document, as these pdf files are automatically compressed once they are
created. Therefore, the ability to edit an already created PDF file is very
complicated by its level of protection.
Other noteworthy data is that these documents can be viewed from any
device, as it is not necessary to have a specific program; also the weight of
the files is much lower because these texts are compressed unlike Word
documents, which is why in this chapter we will only learn to create such
files.
First, we will download the library with the command "Pip3 install fpdf".
With this simple example, we can observe the level of difficulty of working
with a PDF file. We observe that we need a lot of commands, we will start
with the FPDF class of the fpdf library, we will create our pdfdoc object (it
is recommended that the names have coherence to avoid confusion, but it
can be of your preference), this will be our pdf document.
Once we have created this, we are going to customize what is format, size
and font style through the set_font command.
Next, we will add a page with the command add_page(), this will be the
page on which we are going to write since the function fdpf is not able to
create a blank page by default. Here we are going to insert the information
with the help of the cell() function; this will contain the width and height
that our cell will occupy.
Finally, let's save our document with the command output() together with
the arguments that accompany it to store our file with its name. It is
important that it contains the ".fpdf" so that the program understands that
we want a pdf file, and we end up with an F string.
Handling of BIN Type Files
As we have been learning, in Python, there is a great variety of file types,
which are not exclusively text files, some are processed line by line and
others are going to be processed byte by byte, giving a different meaning to
the program. It is important that these files are manipulated in their correct
format because otherwise, this will generate an error.
The files of the Binary format are nothing more than adding a b in the
parameters of our mode.
If we do not know the current position of the data, the file.tell() function
will indicate the number of bytes that have elapsed since we started with
our file.
If we want to modify the current position of our file, we will use the
function file.seek(star,from) since this will allow us to move to a
certain number of bytes from the beginning to the end of the file.
Chapter 9: Exceptions
Exceptions can be defined as errors that occur during the execution of the
program. When the syntax of the code is written correctly, and during its
execution, something unexpected happens, an error that does not allow to
execute the line of code where the error is manifesting and the subsequent
ones.
In this case, the program will fail when reaching the code line where we get
to the unexpected error.
And probably, the following lines are not going to be able to be executed,
and they may have important information for our program, and therefore,
we want it to be carried out. That is why it is important to know how to
solve these exceptions that can be presented to us.
Several examples could be presented at runtime; below, we will mention the
most commonly used to explain this type of errors:
When seeing the code, there are several possibilities that an error may arise,
since you can enter values that are not integers, therefore, the int() function
could generate an error, on the other hand, there is the possibility that an
error occurs in the division, because the denominator can be equal to zero,
generating an error of division between zero.
What Is the Possible Solution to This Problem?
These problems can be solved by doing what is known as "Capture or
exception control", which basically consists of telling our code to try to
perform the instruction and in the case that it cannot perform it, that at least
the rest of the program can be executed.
The line that has generated the error of the previous example is a line of
codes that do not generate a division by zero, and this is an unexpected
error. Perhaps this line will never be executed, but applying "capture or
exception control", will allow us that the rest of the lines of code that
follows after this one can be executed successfully.
The first thing we must do is discover where the error is and how it is called
and this part is very important to determine, since, if we do not find the line
of code where the error is being generated, we will not be able to apply the
necessary instruction to solve the problem.
Once identified the instruction that generates the error, this instruction must
be put inside a "Try" block that means "try", and is applied just before the
instruction that generates the error, is there where we will introduce the
word "try", followed by the instruction that generates the error and then the
word "except", followed by the name of the error. Following with the
previous serious example: "ZerodivisionError" and within this clause, we
can add what we want the program to do, for example, we can add "can not
be divided by zero", and after the error message what we want to return,
return "erroneous operation".
In conclusion we can eliminate and solve the error, with "try", and in this
way we are telling the system to try to make this division or the instruction
that is generating the error, and if it does not succeed, it will execute what is
inside the "except" that for this example is the message that says "cannot be
divided by zero", this process is something similar to what happens with the
sentences "if" and "else".
This way we control an exception, maybe we can't fix the error, but we will
be able to keep executing the rest of the lines of code that follow after this
error and execute the program.
We can also capture several exceptions, that is, we can get with errors of the
type division by zero as explained above, but also can happen other types of
errors, for example, suppose that the program asks us to enter a numerical
value and add a different value to a numerical one such as a text. Obviously
it will throw us an error, and we are going to do the same as in the previous
case and in this case we would be in the presence of an error "ValueError",
and in this case we must capture several exceptions, we begin again
locating the line that contains the error or lines of codes that originate this
error of value.
The solution is to roll these two lines of code with another "try" block,
"except", where we specify the error we want to capture, as it was done in
the previous case, with the difference that now we are capturing two lines of
codes and then we generate and execute the code.
There are several serial communication protocols, but in this case, we will
focus on two, which are the most used at present. The RS-232 and the
RS485. The first is used almost everywhere; moreover, all computers have
an RS232 port, which has twenty-five pins and is designed to communicate
a large amount of data at a short distance. On the other hand, we have the
RS-485, mainly designed for industrial communications, over long
distances and its technology is based on transmitting binary data, by bits,
but by means of two wires, so as to inhibit the electromagnetic effects of
large wires.
Some of the projects that you could do with serial communication and
Python are device automation, equipment instrumentation, real-time
measuring devices, image processing, among other things. Therefore, we
can assure you that this tool is extremely useful for your professional life.
The library that we will use in this case is PySerial; for its installation, we
use the console and the command:
https://pyserial.readthedocs.io/en/latest/
But you don't need to be an engineer to venture into this area, since you can
imagine a database as an excel sheet, which will be filled, depending on the
type of item you want to enter the database. For example, imagine that you
need a database of soccer balls, since you have the balls of the World Cups
of 2002, 2006, 2018, as well as several international tournaments, so you
need a database to know the inventory of them.
Well, to design this, we need the knowledge of a very simple programming
language, but a little new for us, as is SQL, which is responsible for
programming the databases, but, in fact, the syntax is very simple, and to
program this does not take much time, a few minutes I would say.
After that, you will be able to design your own databases and work with
them, either for your work or for your personal use.
For more information about this library and how to use it, visit the
following link about your documentation:
https://pymysql.readthedocs.io/en/latest/
For this case of graphical interface design, there are many libraries, but the
simplest is Tkinter, which has a cumulus of properties that allow us to make
good and beautiful interfaces and if you are a little more advanced in the
area of programming, you could use another library such as PyQT5, the
same has some similarity to C++.
For the use of Tkinter, it is not necessary to import the library of them as in
the other two previous items, but what you should use when making code is
obviously import the module through the import sentence, as follows:
From tkinter import *
Already knowing all these topics that can cover Python, you can imagine
the potential you have in your hands, we can only tell you that the limits are
in your mind, get to work and program what you want.
Conclusion
Thank you for making it through to the end of Python for Beginners: A Step
by Step Crash Course to Learn Smarter the Fundamental Elements of
Python Programming, Machine Learning, Data Science and Tools, Tips and
Tricks of This Coding Language, let’s hope it was informative and able to
provide you with all of the tools you needed to achieve your goal of
becoming a programmer.
The next step is to start programming with all the tools we provided you in
this book in order to keep improving your programming skills. As you
could have seen, programming is not a very hard activity, but it is really
important to practice since if this is not done, things can be forgotten.
If you have any doubts about anything, read it again and see the examples,
imagine one of your own and code it in order to better understand the
functioning of each statement and elements.
As you could see in the last chapter, Python can be used for a lot of things,
it is almost used on everything, so, if you are good at a certain area or
profession, now that you know how to code, you should be able to develop
programs that make your daily activities easier. If you are an economist,
you can make a program that calculates the taxes or fees of certain amounts
of money, and if you are a doctor, you can make a program that calculates
the amount of solution needed at the hospital. Those are just simple
examples of what you can do and reach with this programming language
Finally, if you found this book useful in any way, a review on Amazon is
always appreciated!