ML Chapter 01
ML Chapter 01
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
What is Data?
Data is that data is different types of information usually formatted in a particular manner. All
software is divided into two major categories:
We use data science to make it easier to work with data. Data science is defined as a field that
combines knowledge of mathematics, programming skills, domain expertise, scientific methods,
algorithms, processes, and systems to extract actionable knowledge and insights from both
structured and unstructured data, then apply the knowledge gleaned from that data to a wide
range of uses and domains. Another definition Data is the foundation of data science; it is the
material on which all the analyses are based. In the context of data science, there are two types
of data: traditional, and big data.
• Traditional data: is data that is structured and stored in databases which analysts can
manage from one computer; it is in table format, containing numeric or text values.
Actually, the term “traditional” is something we are introducing for clarity. It helps
emphasize the distinction between big data and other types of data.
• Big data: is… bigger than traditional data, and not in the trivial sense. From variety
(numbers, text, but also images, audio, mobile data, etc.), to velocity (retrieved and
computed in real time), to volume (measured in tera-, peta-, exa-bytes), big data is
usually distributed across a network of computers.
1|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
it as a separate field with three aspects: data design, collection, and analysis. It still took
another decade for the term to be used outside of academia. Future of data science Artificial
intelligence and machine learning innovations have made data processing faster and more
efficient. Industry demand has created an ecosystem of courses, degrees, and job positions
within the field of data science. Because of the cross-functional skillset and expertise required,
data science shows strong projected growth over the coming decades.
What is Information?
Information is defined as classified or organized data that has some meaningful value for the
user. Information is also the processed data used to make decisions and take action. Processed
data must meet the following criteria for it to be of any significant use in decision-making: -
Accuracy: The information must be accurate. - Completeness: The information must be
complete. - Timeliness: The information must be available when it’s needed.
2|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Today data is everywhere in every field. Whether you are a data scientist, marketer,
businessman, data analyst, researcher, or you are in any other profession, you need to play or
experiment with raw or structured data. This data is so important for us that it becomes
important to handle and store it properly, without any error. While working on these data, it is
important to know the types of data to process them and get the right results. There are two
types of data: Qualitative and Quantitative data, which are further classified into:
The data is classified into four categories: - Ordinal data. - Discrete data. - Continuous data.
Types of Data
3|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Nominal Data
Nominal Data is used to label variables without any order or quantitative value. The color of
hair can be considered nominal data, as one color can’t be compared with another color.
The name “nominal” comes from the Latin name “nomen,” which means “name.” With the help
of nominal data, we can’t do any numerical tasks or can’t give any order to sort the data. These
data don’t have any meaningful order; their values are distributed into distinct categories.
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order by their
position on the scale. These data are used for observation like customer satisfaction, happiness,
etc., but we can’t do any arithmetical tasks on them.
Ordinal data is qualitative data for which their values have some kind of relative position. These
kinds of data can be considered “in-between” qualitative and quantitative data. The ordinal
data only shows the sequences and cannot use for statistical analysis. Compared to nominal
data, ordinal data have some kind of order that is not present in nominal data.
Quantitative Data
Quantitative data can be expressed in numerical values, making it countable and including
statistical data analysis. These kinds of data are also known as Numerical data. It answers the
questions like “how much,” “how many,” and “how often.” For example, the price of a phone,
the computer’s ram, the height or weight of a person, etc., falls under quantitative data.
Quantitative data can be used for statistical manipulation. These data can be represented on a
wide variety of graphs and charts, such as bar graphs, histograms, scatter plots, boxplots, pie
charts, line graphs, etc.
4|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Discrete Data
The term discrete means distinct or separate. The discrete data contain the values that fall
under integers or whole numbers. The total number of students in a class is an example of
discrete data. These data can’t be broken into decimal or fraction values. The discrete data are
countable and have finite values; their subdivision is not possible. These data are represented
mainly by a bar graph, number line, or frequency table.
Continuous Data
Continuous data are in the form of fractional numbers. It can be the version of an android
phone, the height of a person, the length of an object, etc. Continuous data represents
information that can be divided into smaller levels. The continuous variable can take any value
within a range.
The key difference between discrete and continuous data is that discrete data contains the
integer or whole number. Still, continuous data stores the fractional numbers to record
different types of data such as temperature, height, width, time, speed, etc.
5|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
At its core, machine learning is all about creating and implementing algorithms that facilitate
these decisions and predictions. These algorithms are designed to improve their performance
over time, becoming more accurate and effective as they process more data.
In traditional programming, a computer follows a set of predefined instructions to perform a
task. However, in machine learning, the computer is given a set of data and a task to perform,
but it’s up to the computer to figure out how to accomplish the task based on the examples it’s
given.
For instance, if we want a computer to recognize images of cats, we don’t provide it with
specific instructions on what a cat looks like. Instead, we give it thousands of images of cats and
let the machine learning algorithm figure out the common patterns and features that define a
cat. Over time, as the algorithm processes more images, it gets better at recognizing cats, even
when presented with images it has never seen before.
This ability to learn from data and improve over time makes machine learning incredibly
powerful and versatile. It’s the driving force behind many of the technological advancements
we see today, from voice assistants and recommendation systems to self-driving cars and
predictive analytics.
6|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
7|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
– Retail. Recommendation systems, supply chains, and customer service can all
benefit from machine learning.
– The techniques used also find applications in sectors as diverse as agriculture,
education, and entertainment.
• Enabling automation. Machine learning is a key enabler of automation. By learning from
data and improving over time, machine learning algorithms can perform previously
manual tasks, freeing humans to focus on more complex and creative tasks. This not
only increases efficiency but also opens up new possibilities for innovation.
How Does Machine Learning Work?
Understanding how machine learning works involves delving into a step-by-step process that
transforms raw data into valuable insights. Let’s break down this process:
8|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Preprocessing improves the quality of your data and ensures that your machine learning model
can interpret it correctly. This step can significantly improve the accuracy of your model. Our
course, Preprocessing for Machine Learning in Python, explores how to get your cleaned data
ready for modeling.
Step 3: Choosing the right model
Once the data is prepared, the next step is to choose a machine learning model. There are
many types of models to choose from, including linear regression, decision trees, and neural
networks. The choice of model depends on the nature of your data and the problem you’re
trying to solve.
Factors to consider when choosing a model include the size and type of your data, the
complexity of the problem, and the computational resources available. You can read more
about the different machine learning models in a separate article.
Step 4: Training the model
After choosing a model, the next step is to train it using the prepared data. Training involves
feeding the data into the model and allowing it to adjust its internal parameters to better
predict the output.
During training, it’s important to avoid overfitting (where the model performs well on the
training data but poorly on new data) and under fitting (where the model performs poorly on
both the training data and new data). You can learn more about the full machine learning
process in our Machine Learning Fundamentals with Python skill track, which explores the
essential concepts and how to apply them.
Step 5: Evaluating the model
Once the model is trained, it’s important to evaluate its performance before deploying it. This
involves testing the model on new data it hasn’t seen during training.
Common metrics for evaluating a model’s performance include accuracy (for classification
problems), precision and recall (for binary classification problems), and mean squared error (for
regression problems). We cover this evaluation process in more detail in our Responsible AI
webinar.
Step 6: Hyper parameter tuning and optimization
After evaluating the model, you may need to adjust its hyper parameters to improve its
performance. This process is known as parameter tuning or hyper parameter optimization.
Techniques for hyper parameter tuning include grid search (where you try out different
combinations of parameters) and cross validation (where you divide your data into subsets and
train your model on each subset to ensure it performs well on different data).
9|Page
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
10 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
and other information, the algorithm can group customers into segments that exhibit similar
behaviors without any pre-existing labels.
11 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
The finance sector has also greatly benefited from machine learning. It’s used for credit scoring,
algorithmic trading, and fraud detection. A recent survey found that 56% of global
executives said that artificial intelligence (AI) and machine learning have been implemented
into financial crime compliance programs.
Transportation
Machine learning is at the heart of the self-driving car revolution. Companies like Tesla and
Waymo use machine learning algorithms to interpret sensor data in real-time, allowing their
vehicles to recognize objects, make decisions, and navigate roads autonomously. Similarly, the
Swedish Transport Administration recently started working with computer vision and machine
learning specialists to optimize the country’s road infrastructure management.
Some Applications of Machine Learning
Machine learning applications are all around us, often working behind the scenes to enhance
our daily lives. Here are some real-world examples:
Recommendation systems
Recommendation systems are one of the most visible applications of machine learning.
Companies like Netflix and Amazon use machine learning to analyze your past behavior and
recommend products or movies you might like. Learn how to build a recommendation engine
in Python with our online course.
Voice assistants
Voice assistants like Siri, Alexa, and Google Assistant use machine learning to understand your
voice commands and provide relevant responses. They continually learn from your interactions
to improve their performance.
Fraud detection
Banks and credit card companies use machine learning to detect fraudulent transactions. By
analyzing patterns of normal and abnormal behavior, they can flag suspicious activity in real-
time. We have a fraud detection in Python course, which explores the concept in more detail.
Social media
Social media platforms use machine learning for a variety of tasks, from personalizing your feed
to filtering out inappropriate content.
12 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Our machine learning cheat sheet covers different algorithms and their uses
Machine Learning Tools
In the world of machine learning, having the right tools is just as important as understanding
the concepts. These tools, which include programming languages and libraries, provide the
building blocks to implement and deploy machine learning algorithms. Let’s explore some of
the most popular tools in machine learning:
Python for machine learning
Python is a popular language for machine learning due to its simplicity and readability, making
it a great choice for beginners. It also has a strong ecosystem of libraries that are tailored for
machine learning.
Libraries such as NumPy and Pandas are used for data manipulation and analysis, while
Matplotlib is used for data visualization. Scikit-learn provides a wide range of machine learning
algorithms, and TensorFlow and PyTorch are used for building and training neural networks.
R for machine learning
13 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
R is another language widely used in machine learning, particularly for statistical analysis. It has
a rich ecosystem of packages that make it easy to implement machine learning algorithms.
Packages like caret, mlr, and randomForest provide a variety of machine learning algorithms,
from regression and classification to clustering and dimensionality reduction.
TensorFlow
TensorFlow is a powerful open-source library for numerical computation, particularly well-
suited for large-scale machine learning. It was developed by the Google Brain team and
supports both CPUs and GPUs.
TensorFlow allows you to build and train complex neural networks, making it a popular choice
for deep learning applications.
Scikit-learn
Scikit-learn is a Python library that provides a wide range of machine learning algorithms for
both supervised and unsupervised learning. It’s known for its clear API and detailed
documentation.
Scikit-learn is often used for data mining and data analysis, and it integrates well with other
Python libraries like NumPy and Pandas.
Keras
Keras is a high-level neural networks API, written in Python and capable of running on top of
TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation.
Keras provides a user-friendly interface for building and training neural networks, making it a
great choice for beginners in deep learning.
The Top Machine Learning Careers in 2023
Machine learning has opened up a wide range of career opportunities. From data science to AI
engineering, professionals with machine learning skills are in high demand. Let’s explore some
of these career paths:
Data scientist
A data scientist uses scientific methods, processes, algorithms, and systems to extract
knowledge and insights from structured and unstructured data. Machine learning is a key tool
in a data scientist’s arsenal, allowing them to make predictions and uncover patterns in data.
Key skills:
• Statistical analysis
• Programming (Python, R)
14 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
• Machine learning
• Data visualization
• Problem-solving
Essential tools:
• Python
• R
• SQL
• Hadoop
• Spark
• Tableau
A machine learning engineer designs and implements machine learning systems. They
run machine learning experiments using programming languages like Python and R,
work with datasets, and apply machine learning algorithms and libraries.
Key skills:
• Statistics
• System design
Essential tools:
• Python
• TensorFlow
• Scikit-learn
• PyTorch
• Keras
Research scientist
15 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Key skills:
• Programming (Python, R)
• Research methodology
Essential tools:
• Python
• R
• TensorFlow
• PyTorch
• MATLAB
There are many resources available to learn these basics. Online platforms like Khan Academy
and Coursera offer courses in mathematics and programming. Books like “Think Stats” and
“Python Crash Course” are also good starting points.
Choose the right tools
Choosing the right tools is crucial in machine learning. Python, along with libraries like NumPy,
Pandas, and Scikit-learn, is a popular choice due to its simplicity and versatility.
Learn machine learning algorithms
Once you’re comfortable with the basics, you can start learning about machine learning
algorithms. Start with simple algorithms like linear regression and decision trees before moving
on to more complex ones like neural networks.
Work on projects
Working on projects is a great way to gain practical experience and reinforce what you’ve
learned. Start with simple projects like predicting house prices or classifying iris species, and
gradually take on more complex projects. We have an article exploring 25 machine learning
projects for all levels, which can help you find something appropriate.
In recent years,
machine learning and artificial intelligence (AI) have dominated parts of data science, playing a
critical role in data analytics and business intelligence. Machine learning automates the process
17 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
of data analysis and goes further to make predictions based on collecting and analyzing large
amounts of data on certain populations. Models and algorithms are built to make this happen.
Skills needed
To build a career in data science, such as becoming a data scientist, you’ll want to gain
programming and data analytics skills.
• Strong knowledge of programming languages R, SAS, and more
• Familiarity working with large amounts of structured and unstructured data
• Comfortable with processing and analyzing data for business needs
• Understanding of math, statistics, and probability
• Data visualization and data wrangling skills
• Knowledge of machine learning algorithms and models
• Good communication and teamwork skills
Careers in data science
Besides the obvious career as a data scientist, there are plenty of other data science jobs to
choose from.
• Data scientist : Uses data to understand and explain the phenomena around them, to
help organizations make better decisions.
• Data analyst: Gathers, cleans, and studies data sets to help solve business problems.
• Data engineer: Build systems that collect, manage, and transform raw data into
information for business analysts and data scientists.
• Business intelligence analyst: Gathers, cleans, and analyzes sales and customer data,
interprets it, and shares findings with business teams.
18 | P a g e
Abdallah Mahmoud
Facebook: https://www.facebook.com/abdallahriig
LinkedIn: https://www.linkedin.com/in/abdallahmahmud/
Skills needed
To become a successful machine learning engineer, you’ll need to be well-versed in the
following:
• Expertise in computer science, including data structures, algorithms, and architecture
• Strong understanding of statistics and probability
• Knowledge of software engineering and systems design
• Programming knowledge, such as Python, R, and more
• Ability to conduct data modeling and analysis
Careers in machine learning
If you decide to pursue a career in machine learning and artificial intelligence, there are several
options to choose from.
• Machine learning engineer: Researches, builds, and designs the AI responsible for
machine learning, and maintaining or improving AI systems
• Computational linguist : Develop and design computers that deal with how human
language works
19 | P a g e