DIR Notes 1
DIR Notes 1
Chapter 1 - Introduction to
Machine Learning
1
30/01/2025
Human Learning
2
30/01/2025
3
30/01/2025
Other tasks
4
30/01/2025
What do we do?
• Just like what we did to Tarzan
• We need to provide experience to machine
• First, we give enough data to the machine
• This is called training data
5
30/01/2025
What is ML?
• ML is a field of artificial intelligence where computers learn to
perform tasks without being explicitly programmed, relying on
patterns and inference instead.
6
30/01/2025
ML Paradigms
Depends on the nature of the problems or
type of the dataset, ML methods are
classified in to:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
7
30/01/2025
Supervised Learning
8
30/01/2025
Unsupervised ML
• Unsupervised learning is a type of machine learning where the
algorithm is given unlabeled data and must find patterns,
relationships, or structures within the data without explicit guidance
on the correct output.
• The goal of unsupervised learning is to explore the inherent structure
of the data, discover hidden patterns, or group similar data points.
Unsupervised Learning
9
30/01/2025
Clustering
Clustering –
Grouping similar
items together
Reinforcement Learning
10
30/01/2025
Reinforcement Learning (RL) is a type of machine learning paradigm where an agent learns to make decisions by
interacting with an environment. The agent receives feedback in the form of rewards or penalties based on the actions it
takes in the environment. The goal of reinforcement learning is for the agent to learn a strategy or policy that maximizes
the cumulative reward over time.
11
30/01/2025
Types of ML Algorithms
12
30/01/2025
Regression models
• If the labels are continuous
• These model can predict output (continuous data) from the
previously learned experience
Regression is a type of statistical modeling technique used in ML to analyze the relationships between dependent and
independent variables. The goal of regression is to predict a continuous numerical output based on one or more input
features.
13
30/01/2025
14
30/01/2025
Examples for RL
• Reinforcement Learning (Trial & Error → Learn from Rewards)
Example 1: Self-Driving Cars
A car learns to drive by interacting with the environment, receiving rewards for
safe driving and penalties for accidents.
Action: Turn, accelerate, brake
Reward: Safe driving (positive) / Collision (negative)
ML in Mechanical Engineering
• ML is increasingly being integrated into Mechanical Engineering to optimize
processes, improve efficiency, and enhance predictive capabilities.
Example 1. Predictive Maintenance
ML models analyze sensor data (vibration, temperature, pressure, etc.) to predict
equipment failures before they occur.
Helps in reducing downtime and maintenance costs.
Used in industries like aerospace, automotive, and manufacturing.
15
30/01/2025
Continue..
Example 3: Quality Control & Defect Detection
• ML-powered computer vision inspects manufacturing defects using high-
resolution images.
• Reduces human errors in quality inspection.
• Used in automotive, aerospace, and additive manufacturing.
Example 4: Smart Manufacturing:
ML improves factory automation through digital twin and real-time process
optimization.
Used in CNC machining, welding, and robotic manufacturing.
2. Multiclass:
Car make: Toyota, Range rover, FJ Cruiser, BMW, Benz, Honda
Fruit type: Mango, Orange, Apple, Pineapple
16
30/01/2025
17
30/01/2025
Steps involved
1. First task is to find features that can represents objects.
18
30/01/2025
19
30/01/2025
20
30/01/2025
21
30/01/2025
22
30/01/2025
To summarize
1. Identify the features
2. Represent the vehicles by the features
3. Remove non-informative features
4. Build classification model from the features (training the model)
5. Perform classification of the unknown data (testing the model)
6. Optimization of the model scores
23
30/01/2025
EDA continue..
3. Data Visualization:
• Use visualizations like histograms, box plots, scatter plots, and correlation matrices to
explore relationships and distributions.
4. Feature Analysis:
• Analyze the characteristics and properties of individual features.
• Assess the impact of each feature on the target variable.
5. Correlation Analysis:
• Investigate relationships between variables using correlation matrices
6. Outlier Detection:
• Identify and handle outliers that might negatively impact model performance.
• Understand whether outliers are errors or genuine data points.
24
30/01/2025
EDA cont.
7. Data Transformation:
• Explore the need for data transformations, such as normalization or scaling, based on
the distribution of the data.
8. Missing Data Handling:
• Understand the extent of missing data and devise strategies for handling it
(imputation, removal, etc.).
9. Target Variable Analysis:
• For supervised learning tasks, analyze the distribution of the target variable.
• Identify potential class imbalances or regression challenges.
25
30/01/2025
Feature Engineering
• Feature engineering is the process of transforming raw data into a format that is
better suited for machine learning models.
• Feature engineering plays a crucial role in improving the accuracy and efficiency of
models by providing them with more meaningful and relevant input data.
26
30/01/2025
1. Training Set:
• The training set is used to train the ML model. The model learns the patterns
and relationships within the data using this subset.
• The training set comprises a significant portion of the available data, typically
around 70-80% of the entire dataset, depending on the size and
characteristics of the data.
27
30/01/2025
Validation set
• In addition to the training and test sets, ML researchers often use a third
dataset called the validation set.
• The validation set is used to fine-tune model hyperparameters and to
assess the model's performance during intermediate stages of training.
• Typical amount: 10% of dataset
• Training: 70%
• Validation: 10%
• Testing: 20%
Underfitting
• This Underfitting occurs when a model is too simple to capture the
underlying patterns in the training data.
• It fails to capture complex relationships in the data.
• Training performance is poor, and the model may perform poorly on
both the training and test sets.
Remedies:
• Increase model complexity.
• add more features.
• use a more advanced algorithm.
28
30/01/2025
Overfitting:
• Overfitting occurs when a model is too complex and fits the training data
too closely, capturing noise and outliers.
• The model has low bias but high variance.
• It performs very well on the training set but poorly on new, unseen data.
• There is a risk of memorizing noise rather than learning genuine patterns.
Remedies:
1. Use simpler models.
2. Regularize the model (e.g., add regularization terms).
3. Increase the amount of training data.
4. Use feature engineering to reduce dimensionality.
29
30/01/2025
Fittings of Models
30
30/01/2025
Nominal Data
• Nominal data consists of categories with no inherent order or ranking.
• The categories are distinct and represent different groups without any
implied hierarchy.
• Examples:
• Colors (e.g., red, blue, green)
• Types of fruit (e.g., apple, banana, orange)
• Gender (e.g., male, female, non-binary)
• In nominal data, you can't say that one category is "greater" or "higher" than
another; they are simply different.
31
30/01/2025
Ordinal Data
• Ordinal data also consists of categories, but these categories have a
meaningful order or ranking.
• The intervals between categories are not necessarily uniform or well-defined.
• Examples:
• Educational levels (e.g., high school, bachelor's, master's)
• Customer satisfaction ratings (e.g., low, medium, high)
• Socioeconomic status (e.g., low income, middle income, high income)
• In ordinal data, the order matters, but the precise differences between the
categories may not be known or consistent.
For example, the difference in satisfaction between "low" and "medium" may
not be the same as between "medium" and "high."
32
30/01/2025
Text data
• Text data is a prevalent and valuable form of data in ML, and its analysis
falls under the domain of Natural Language Processing (NLP).
• In NLP, the goal is to enable computers to understand, interpret, and
generate human language. Here are some key aspects and techniques
related to handling text data in machine learning:
• Text Preprocessing:
• Tokenization: Breaking down the text into individual words or tokens.
• Lowercasing: Converting all text to lowercase to ensure consistency.
• Removing Stop words: Eliminating common words (e.g., "the," "and") that don't
carry significant meaning.
• Language models
33
30/01/2025
Example
• For a dataset, the min and max observable values as 30 and -10.
Apply min-max normalization for a data point 18.8.
y = (X— Xmin) / (Xmax-Xmin)
y = (18.8 — (-10)) / (30 — (-10))
y = 28.8 / 40
y = 0.72
34
30/01/2025
Z-score normalization
• Also known as standardization, is a preprocessing technique in which
the numerical values of a variable are transformed to have a mean of
0 and a standard deviation of 1.
• This is achieved by subtracting the mean of the variable from each
data point and dividing the result by the standard deviation.
• The formula for Z-score normalization is given by:
35
30/01/2025
Z – score scaling
36
30/01/2025
2. One-Hot Encoding:
• Create binary columns for each category.
• Assign a value of 1 to the column corresponding to the category and 0
otherwise.
• Suitable for nominal categorical data where there is no inherent order.
37
30/01/2025
38
30/01/2025
39