Machine Learning
Machine Learning
Machine Learning
By,
Salman Sadullah Usmani
Apr 11, 2024
Quotes about ML
Just as electricity transformed almost
everything 100 years ago, today I actually
have a hard time thinking of an industry that I
don’t think AI (Artificial Intelligence) will
transform in the next several years
Andrew Ng
Artificial intelligence would be the ultimate version of Google.
The ultimate search engine that would understand everything on
the web. It would understand exactly what you wanted, and it
would give you the right thing. We’re nowhere near doing that
now. However, we can get incrementally closer to that, and that
is basically what we work on.
Larry Page
Artificial Intelligence, deep learning, machine
learning — whatever you’re doing if you don’t
understand it — learn it. Because otherwise you’re
going to be a dinosaur within 3 years
Mark Cuban
AI vs ML vs DL
Definition of ML
• Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it learn for themselves.
• Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and
scarce.
People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven”
(www.amazon.com)
• DeepMind: Healthcare
Type of ML algorithms
• Supervised Learning
Classification
Regression
• Unsupervised Learning
• Reinforcement Learning
Unsupervised Learning
• Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled data
sets.
• discover hidden patterns in data without the need for human intervention (hence, they are
“unsupervised”)
• Clustering
• Association
• Dimensionality reduction
Clustering
• Clustering is a data mining technique for grouping unlabeled data based on their
similarities or differences.
• For example, K-means clustering algorithms assign similar data points into groups,
where the K value represents the size of the grouping and granularity.
• Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.
• It reduces the number of data inputs to a manageable size while also preserving the data
integrity.
• Often, this technique is used in the pre-processing data stage, such as when autoencoders
remove noise from visual data to improve picture quality.
Supervised Learning
• A machine learning approach that’s defined by its use of labeled datasets.
• These datasets are designed to train or “supervise” algorithms into classifying data or predicting
outcomes accurately.
• Using labeled inputs and outputs, the model can measure its accuracy and learn over time.
• Supervised learning can be separated into two types of problems when data mining:
classification and regression.
Supervised Learning
Supervised vs Unsupervised
Classification
• Classification problems use an algorithm to accurately assign test data into specific categories,
such as separating apples from oranges.
• Linear classifiers, support vector machines, decision trees and random forest are all common
types of classification algorithms.
Classification
• Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style
Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for
speech
• Some popular regression algorithms are linear regression, logistic regression and polynomial
regression.
• Regression models are helpful for predicting numerical values based on different data points,
such as sales revenue projections for a given business.
Classification vs Clustering
Reinforcement Learning
• Reinforcement Learning is a part of machine learning. Here, agents are self-trained on reward
and punishment mechanisms.
• It’s about taking the best possible action or path to gain maximum rewards and minimum
punishment through observations in a specific situation. It acts as a signal to positive and
negative behaviors.
• Through a series of Trial and Error methods, an agent keeps learning continuously in an
interactive environment from its own actions and experiences. The only goal of it is to find a
suitable action model which would increase the total cumulative reward of the agent. It learns via
interaction and feedback.
Work flow of Machine learning Process
Data preprocessing
Feature Selection
Resampling Techniques
Evaluation Metrics
Importance: Evaluation metrics help measure the performance and
effectiveness of machine learning models. They provide a way to assess
how well the model performs on unseen data and guide improvements.
1. Accuracy
•Definition: The proportion of correct predictions out of total
predictions.
•Formula: (True Positives + True Negatives) / (True Positives + True
Negatives + False Positives + False Negatives)
•Example: If you classify 100 instances and correctly predict 90 of
them, the accuracy is 90%.
2. Precision
•Definition: The proportion of true positive predictions out of total
positive predictions made by the model.
•Formula: True Positives / (True Positives + False Positives)
•Example: In a binary classification task, if you predict 10 positives and
8 are true positives, precision is 80%.
3. Recall (Sensitivity)
•Definition: The proportion of actual positive instances that the model correctly identifies.
•Formula: True Positives / (True Positives + False Negatives)
•Example: In a medical diagnosis, if there are 50 sick patients and the model correctly identifies 45, recall is 90%.
4. F1-Score
•Definition: Harmonic mean of precision and recall. A balance between precision and recall.
•Formula: 2 * (Precision * Recall) / (Precision + Recall)
•Example: If precision is 80% and recall is 90%, the F1-score is 85%.
5. Mean Squared Error (MSE)
•Definition: Measures the average squared difference between actual and predicted values.
•Formula: (Sum of squared errors) / Number of predictions
•Example: Used in regression tasks, a low MSE indicates that the model's predictions are close to the actual values.
6. Root Mean Squared Error (RMSE)
•Definition: The square root of MSE, giving a metric in the same units as the output.
•Example: If the MSE is 4, the RMSE is 2.
7. Receiver Operating Characteristic (ROC) Curve
•Definition: A graph showing the trade-off between the true positive rate (recall) and the false positive rate (1-specificity) at different thresholds.
•Application: Helps evaluate classification models, particularly when classes are imbalanced.
8. Area Under the ROC Curve (AUC-ROC)
•Definition: Measures the area under the ROC curve. A higher AUC-
ROC value indicates better model performance.
•Example: An AUC-ROC of 0.5 means the model is random, while 1.0
indicates perfect discrimination.
9. Confusion Matrix
•Definition: A table showing the number of true positives, false
positives, true negatives, and false negatives.
•Use: Helps visualize the model's performance and identify potential
areas for improvement.
10. Log Loss
•Definition: Measures the uncertainty of the model's predictions.
•Formula: - (Sum of actual labels * log(predicted probabilities) + (1
- actual labels) * log(1 - predicted probabilities)) / Number of
predictions
•Example: Lower log loss indicates a better model.
11. Mean Absolute Error (MAE)
•Definition: MAE quantifies the average distance between actual values and predicted values, taking the absolute difference between them. It
provides an idea of how close the predictions are to the actual values.
•Formula:
• Where:
• n is the number of data points.
• yi is the actual value of the i-th data point.
• y^i is the predicted value for the i-th data point.
• The absolute difference ∣yi−y^i∣ is calculated for each data point, and the mean is taken.
•Interpretation:
• A lower MAE indicates that the predictions are closer to the actual values.
• MAE provides a clear and intuitive measure of the model's performance.
•Advantages:
• Easy to understand and interpret.
• Not as sensitive to outliers as some other metrics, such as mean squared error (MSE).
•Disadvantages:
• Does not differentiate between positive and negative errors, treating all errors equally.
12. Gini Coefficient
•Definition: The Gini coefficient measures the impurity of a data set or node in a decision tree. It calculates the probability
that a randomly chosen instance from the data set will be misclassified if it were to be randomly labeled.
•Formula:
• Given a set of classes in a data set, the Gini coefficient is calculated as follows:
•
• Where:
• k is the number of classes.
• pj is the probability (or proportion) of class j in the data set.
•Range:
• The Gini coefficient ranges from 0 to 1.
• A value of 0 indicates perfect purity, where all instances belong to one class.
• A value of 1 indicates maximum impurity, where the instances are equally distributed among the different classes.
Confusion Matrix
Model Overfitting and Underfitting
In machine learning, finding the right balance between fitting a model too closely or too loosely to the data is essential for good
model performance. Understanding overfitting and underfitting helps in selecting the most appropriate model complexity.
Overfitting
A model is overfitted when it captures not only the underlying patterns in the data but also the noise and fluctuations in the training
set.
•Causes:
• Excessive model complexity (e.g., too many parameters, high-degree polynomial)
• Insufficient training data relative to model complexity
•Consequences:
• Poor generalization to new, unseen data
• High variance and low bias in predictions
• High training accuracy but low testing accuracy
•Solutions:
• Simplify the model (e.g., reduce the number of parameters)
• Use regularization techniques (e.g., L1 or L2 regularization)
• Cross-validation to select appropriate model complexity
Underfitting
A model is underfitted when it is too simple and fails to capture the underlying patterns in the data.
•Causes:
• Insufficient model complexity (e.g., too few parameters)
• Lack of relevant features or poor feature engineering
• Inadequate training data for the model
•Consequences:
• Poor performance on both training and testing data
• High bias and low variance in predictions
• The model may fail to identify relationships in the data
•Solutions:
• Increase model complexity (e.g., add more features or layers)
• Use more sophisticated algorithms
• Feature engineering (e.g., creating new features or transformations)
• Increase the size of the training dataset
Take away Message: Statistical Challenges in Machine
Learning
•Challenges:
• Dealing with missing data
• Handling outliers
• Multicollinearity: Correlation among input features
• Imbalanced datasets: Unequal distribution of classes
•Solutions:
• Data imputation, scaling, and normalization
• Robust statistics for handling outliers
• Feature selection and engineering