Unit - 3 - ML
Unit - 3 - ML
Unit - 3 - ML
• Uses of ML
empowers computers to learn and improve from experience, without being explicitly
programmed for every task. It has emerged as a powerful tool for analyzing complex
data, recognizing patterns, and making intelligent decisions based on vast amounts of
computers to adapt and evolve their performance over time, making it a crucial
detect patterns, and make predictions or decisions based on this learned knowledge.
This process is like how humans learn from experience and use that knowledge to
from data and improve their performance on a specific task without being explicitly
programmed for that task. The primary goal of machine learning is to allow
computers to learn patterns and make predictions or decisions based on the data
where each input is paired with the corresponding correct output. The algorithm
unseen data.
model on unlabeled data, without explicit guidance on the correct answers. The
increase further, opening new possibilities for innovation and problem-solving across
various domains.
Machine Learning vs. Traditional Programming
instructions and rules that dictate how the computer should process data and
learn and improve from data. Instead of explicit programming, algorithms are
and logic to make decisions. These rules are typically hardcoded into the program.
Machine Learning: Machine learning models are data-driven and learn patterns
from examples. They derive patterns and relationships from data, allowing them to
manual updates to the codebase. The program's flexibility is limited to what the
• Machine Learning: Machine learning models can adapt to new data and evolve
their behavior without direct human intervention. This adaptability makes them
code with numerous rules and conditions, leading to code that is difficult to
maintain.
• Machine Learning: Machine learning excels at dealing with complex tasks and
large amounts of data, as it can find intricate patterns that may be challenging
rules and clear logic, where the problem's structure is fully understood.
complex datasets.
Interpretability:
learning models.
Machine learning working process
Data Collection:
The first step is to gather relevant and representative data that will be used to
train the machine learning model. The quality and size of the dataset play a crucial
Data Preprocessing:
in a suitable format for training. This may involve handling missing values, scaling
The dataset is divided into two or more subsets: the training set, validation set,
and test set. The training set is used to train the model, the validation set is used
to fine-tune the model's hyperparameters, and the test set is used to evaluate the
Model Selection:
During this step, the chosen model is fed with the training data to learn patterns
and relationships within the data. The model's parameters are adjusted iteratively
to minimize the difference between its predictions and the actual target values.
Model Evaluation:
After training, the model's performance is evaluated using the validation set or
are used to assess how well the model generalizes to new, unseen data.
Hyperparameter Tuning:
Many machine learning models have hyperparameters that control their behavior.
Model Testing:
Once the model is fine-tuned and performs well on the validation set, it is
evaluated on the test set to assess its final performance on completely unseen
data. This step helps gauge the model's ability to generalize to real-world
scenarios.
Model Deployment:
ensure they continue to perform well over time. As new data becomes available or
the data distribution changes, the model may need retraining or updates.
Uses of Machine learning
Image and Speech Recognition: Machine learning is widely used in image and
industry.
campaigns.
enabling robots to learn from their environment and adapt to various tasks.
Fraud Detection: Machine learning algorithms are employed to detect fraudulent
Ethical and Bias Concerns: Machine learning models can inherit biases present
in the training data, leading to discriminatory or unfair outcomes. Addressing
biases and ensuring ethical use of machine learning models is critical.
Challenges of Machine Learning
Data Quality and Quantity: Machine learning algorithms heavily depend on large
and high-quality datasets for effective training. Obtaining and preparing such
datasets can be time-consuming and resource-intensive, and poor data quality
can lead to biased or inaccurate models.
Data Preprocessing: Cleaning, preprocessing, and transforming raw data into a
suitable format for training can be complex and require domain knowledge.
Incorrect preprocessing can negatively impact model performance.
Overfitting and Underfitting: Balancing model complexity to avoid overfitting
(fitting too closely to the training data) or underfitting (being too simplistic) is
challenging. Achieving the right balance is crucial for good generalization on
unseen data.
Algorithm Selection and Hyperparameter Tuning: Selecting the most appropriate
machine learning algorithm for a specific problem and fine-tuning its
hyperparameters require expertise and experimentation.
Interpretability and Explainability: Many machine learning models, especially
deep learning models, lack transparency and are considered "black boxes."
Understanding the reasons behind model predictions is crucial for gaining user
trust, especially in critical applications.
Computational Resources: Some machine learning algorithms, particularly deep
learning models, demand significant computational power and memory, which
can be a limitation for resource-constrained environments.
Transfer Learning and Generalization: Transferring knowledge from one task or
domain to another effectively requires careful consideration and adaptation of
the model.
Handling Imbalanced Data: In cases where one class is significantly more
prevalent than others, machine learning models may perform poorly on
underrepresented classes, leading to biased outcomes.
Privacy and Security: Machine learning models can inadvertently learn sensitive
information from the data they are trained on, posing privacy risks. It is
essential to implement privacy-preserving techniques when handling sensitive
data.
Continuous Learning and Adaptation: Adapting machine learning models to
changing data distributions or concept drift can be challenging. Continuous
learning and retraining are often required to maintain model performance.
Deployment and Integration: Deploying machine learning models into real-world
applications and integrating them with existing systems can be complex and
require careful consideration of scalability, efficiency, and maintenance.
Ethical and Bias Concerns: Machine learning models can inherit biases present
in the training data, leading to discriminatory or unfair outcomes. Addressing
biases and ensuring ethical use of machine learning models is critical.
Email Spam Filtering
Problem /Task : Automatically filter out spam emails from your inbox.
How Machine Learning Works in This Scenario
1.Data Collection:
1. Training Data: You collect a large set of emails, some of which are labeled as "spam" and
others as "not spam."
2. Features: For each email, you extract features like:
1. Presence of certain keywords (e.g., "Free," "Win," "Discount")
2. Sender's email address
3. Frequency of links
4. Email subject length
5. Use of specific phrases
2.Model Selection:
1. You choose a machine learning algorithm that suits your problem. Common choices include:
1. Naive Bayes: Simple and effective for text classification.
2. Support Vector Machines (SVM): Good for handling high-dimensional data.
3. Neural Networks: More complex, can capture intricate patterns.
3. Training:
1. Learning Process: The chosen model is trained using the labeled data. The algorithm
learns patterns and associations between the features and the labels (spam or not
spam).
2. Optimization: The model's parameters are adjusted to minimize the error
(misclassification of spam vs. non-spam) during the training phase.
4. Validation:
3. After training, the model is tested on a separate set of emails (not used during training)
to see how well it performs.
4. Metrics: You might look at accuracy, precision, recall, to evaluate the model’s
performance.
5. Deployment:
5. Once validated, the model is deployed to automatically classify incoming emails in real-
time.
6. Continuous Learning:
6. The model can be updated regularly with new data (e.g., when you mark an email as
spam manually), allowing it to adapt to new types of spam emails.
Example in Action
• Training: Suppose you have 10,000 emails. The model analyzes these emails
and learns that emails with words like "Congratulations" and "Prize" are
more likely to be spam.
• Prediction: A new email arrives with the subject "Win a Free Cruise!" The
model checks the features and predicts it is spam with a high probability.
• Outcome: The email is automatically moved to the spam folder.
Personalized Recommendations in E-commerce