Tutorial 3
Tutorial 3
Business Analytics
and Machine Learning
TUTORIAL 3
Definition
• Analytics is a field of computer science that uses math, statistics, and machine
learning to find meaningful patterns in data.
• Business Analytics- When analytics is applied to make business decisions
• Analytics may be considered as a three-step process-
2. Predictive Analytics (forecasting)- (What is likely to happen in the future?) Here the
facts or information from the past are leveraged to understand the future course the
business may assume. (Techniques- Regression, Decision Tree, Machine Learning, etc.)
EDA contd…
Types of variables
• Tools- Tableau, Python, Qlik View, SAS Visual Analytics, Power Bi, R, etc.
EDA contd…
Correlation
• Correlation Analysis is
statistical method that
is used to discover if
there is a relationship
between two
variables/datasets, and
how strong that
relationship may be.
EDA contd…
Methods to find Correlation Coefficient
• Pearson Coefficient (generally, useful for linear relationship between two
continuous variables)
1. First you would look at what spam typically looks like. You might notice
that some words or phrases (such as “4U,” “credit card,” “free,” and
“amazing”) tend to come up a lot in the subject.
2. You would write a detection algorithm for each of the patterns that you
noticed, and your program would flag emails as spam if a number of these
patterns are detected.
3. You would test your program, and repeat steps 1 and 2 until it is good
enough
Traditional approach
Machine Learning approach
Problem with Traditional approach
• If spammers notice that all their emails containing “4U” are
blocked, they might start writing “For U” instead. A spam
filter using traditional programming techniques would need to
be updated to flag “For U” emails. If spammers keep working
around your spam filter, you will need to keep writing new
rules forever.
Automatically adapting to change
Types of Machine Learning Systems
• Broadly classifying:
1. Supervised learning
In supervised learning, the training data you feed to the algorithm includes the
desired solutions, called labels
The spam filter is a good example of this: it is trained with many example emails
along with their class (spam or ham), and it must learn how to classify new emails.
Types of Machine Learning Systems
• Supervised learning deals with two distinct kinds of problems:
Classification problems
Classification problems are often resolved using algorithms such as Naïve Bayes,
Support Vector Machines, Random Forest, Logistic Regression (It is used to
calculate or predict the probability of a binary (yes/no) event occurring), etc.
Regression problems
linear regression, non-linear regression, Bayesian linear regression, etc.
• Clustering
k-Means
Hierarchical Cluster Analysis (HCA)
Expectation Maximization
• The learning system (agent), can observe the environment, select and
perform actions, and get rewards in return (or penalties in the form of
negative rewards
• It does not have a labelled dataset or results associated with data so the only
way to perform a given task is to learn from experience.
• https://www.youtube.com/watch?v=1FZ0A1QCMWc
Main Challenges of Machine
Learning
• Insufficient or poor-quality data
It should be noted,
however, that small- and
medium sized datasets
are still very common,
and it is not always easy
or cheap to get extra
training data, so don’t
abandon algorithms just
yet
Main Challenges of Machine
Learning (Contd.)
• Nonrepresentative Training Data
In order to generalize well, it is crucial that your training data be representative of
the new cases you want to generalize to
The set of countries we used earlier for training the linear model was not perfectly
representative; a few countries were missing
It seems that very rich countries are not happier than moderately rich countries (in
fact they seem unhappier), and conversely some poor countries seem happier than
many rich countries.
Main Challenges of Machine
Learning (Contd.)
• Poor quality data (training data is full of errors, outliers, and noise)
To simplify the model by selecting one with fewer parameters (e.g., a linear model
rather than a high-degree polynomial model), by reducing the number of attributes
in the training data or by constraining the model
To gather more training data
To reduce the noise in the training data (e.g., fix data errors and remove outliers)
Main Challenges of Machine
Learning (Contd.)
• Underfitting the Training Data
A linear model of life satisfaction is prone to underfit; reality is just more complex
than the model
Selecting a more powerful model, with more parameters
Feeding better features to the learning algorithm (feature engineering)
Reducing the constraints on the model