0% found this document useful (0 votes)
8 views17 pages

Module 1 (ML)

Machine learning is essential for businesses as it enables the utilization of vast amounts of data that were previously underused due to integration challenges and lack of awareness about analytical tools. Its popularity is driven by the availability of large datasets, reduced storage costs, and advancements in algorithms, making it a key component of AI, data science, and statistics. Machine learning encompasses various types, including supervised, unsupervised, and reinforcement learning, each serving different purposes and applications across industries.

Uploaded by

ananyanand354
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

Module 1 (ML)

Machine learning is essential for businesses as it enables the utilization of vast amounts of data that were previously underused due to integration challenges and lack of awareness about analytical tools. Its popularity is driven by the availability of large datasets, reduced storage costs, and advancements in algorithms, making it a key component of AI, data science, and statistics. Machine learning encompasses various types, including supervised, unsupervised, and reinforcement learning, each serving different purposes and applications across industries.

Uploaded by

ananyanand354
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MODULE 1

Chapter

1. Why is machine learning needed for business organizations?

Business organizations use huge amount of data for their daily activities. Earlier, the full potential of
this data was not utilized due to two reasons. One reason was data being scattered across different
archive systems and organizations not being able to integrate these sources fully. Secondly, the lack of
awareness about software tools that could help to unearth the useful information from data. Now
business organizations have started to use the latest technology, machine learning.

2. List out the factors that drive the popularity of machine learning.

Machine learning has become so popular because of three reasons:

1. Availability of Large Volumes of Data – Companies like Facebook, Twitter, and YouTube
generate vast amounts of data, which grows exponentially every year. This abundance of
data provides a strong foundation for training machine learning models.
2. Reduction in Storage and Hardware Costs – The cost of data storage and computational
hardware has significantly decreased, making it easier and more affordable to collect,
process, store, and analyze large datasets.
3. Advancements in Machine Learning Algorithms – The development of complex algorithms,
especially with the rise of deep learning, has enabled machines to perform complex tasks
such as image recognition, natural language processing, and autonomous decision-making.

3. What is a model?

A model in machine learning is a mathematical representation of a system that learns patterns from
data and makes predictions or decisions based on new inputs.

4. Distinguish between the terms: Data, Information, Knowledge, and Intelligence.

1. Data – Raw, unprocessed facts and figures. It can be in the form of numbers, text, images, or
other formats, but by itself, it has no meaningful interpretation.
Example: A list of sales transactions.
2. Information – Processed data that provides meaning. It includes patterns, relationships, and
associations derived from raw data.
Example: Monthly sales totals computed from daily sales transactions.
3. Knowledge – A deeper understanding derived from information. It involves recognizing
trends, making predictions, and drawing useful insights.
Example: Historical sales data showing seasonal trends in customer purchases.
4. Intelligence – The actionable application of knowledge to make decisions or automate
processes. It involves reasoning and applying insights effectively.
Example: A machine learning model predicting future sales based on past trends and
recommending inventory adjustments.
5. What is Machine leraning?How is machine learning linked to Al, Data Science, and
Statistics?
Machine learning is closely linked to Artificial Intelligence (AI), Data Science, and Statistics, as
it serves as a key component in each field Definition:
Machine learning is an important sub-branch of Artificial intelligence. According to Arthur
Samuel,Machine learning is the” field of study that gives the computers ability to learn
without being Explicitly programmed”.
According to Tom Mitchell,”A computer program is said to learn from experience E,with
respect to task T and some performance measure P,If its performance on T measured by P
improves with Experience E”

1. Machine Learning and AI

1. Machine learning is a subfield of AI, which focuses on developing intelligent agents


that can learn and make decisions.

2. Early AI focused on logic-based reasoning, but its progress was slow, leading to
periods known as "AI winters."

3. The resurgence of AI happened due to data-driven approaches, where machine


learning extracts patterns from data to make predictions.

4. Deep learning, a subset of machine learning, further strengthened AI by using neural


networks to process vast amounts of data.

2. Machine Learning and Data Science

1. Data science is an umbrella term that encompasses multiple fields, including


machine learning, big data, and data analytics.
2. Machine learning relies on data to identify trends and patterns, making it a crucial
part of data science.

3. Big Data: Machine learning processes vast datasets characterized by volume, variety,
and velocity to perform tasks like language translation and image recognition.

4. Data Mining: Data mining and machine learning overlap, but while data mining
focuses on extracting hidden patterns, machine learning applies these patterns for
predictive modeling.

5. Data Analytics: Predictive data analytics is closely linked to machine learning, as


both use similar algorithms to analyze and interpret data.

6. Pattern Recognition: A specialized engineering field that applies machine learning to


classify and analyze patterns in data.

3. Machine Learning and Statistics

1. Statistics is a mathematical field that provides the theoretical foundation for


statistical learning, which machine learning builds upon.

2. While statistics focuses on hypothesis testing and verifying data relationships,


machine learning emphasizes pattern recognition and predictive modeling.

3. Statistical methods rely on structured mathematical models, whereas machine


learning can adapt and improve through experience.

Machine learning serves as a bridge between AI, data science, and statistics, enabling intelligent
decision-making, data-driven insights, and predictive analytics across various industries.

Differences between Classification and Regression?


List out the types of Machine Learning.
Types of Machine Learning
Machine Learning (ML) is broadly categorized into three types based on how the model learns from
data:
1. Supervised Learning
1. In supervised learning, the algorithm learns from labelled data, where each input has
a corresponding output.
2. It is used for classification and regression tasks.
3. Examples:
1. Classification: Email spam detection (Spam/Not Spam)
2. Regression: Predicting house prices based on features like size and location
4. Common Algorithms: Linear Regression, Logistic Regression, Decision Trees,
Support Vector Machines (SVM), and Neural Networks.
2. Unsupervised Learning
1. In unsupervised learning, the algorithm learns patterns and relationships from
unlabelled data without predefined outputs.
2. It is mainly used for clustering and association rule mining.
3. Examples:
1. Clustering: Customer segmentation in marketing
2. Association: Market basket analysis (identifying product purchase patterns)
4. Common Algorithms: K-Means Clustering, Hierarchical Clustering, Principal
Component Analysis (PCA), and Apriori Algorithm.
3. Reinforcement Learning (RL)
1. In reinforcement learning, an agent learns by interacting with an environment and
receiving rewards or penalties for its actions.
2. It is used in decision-making applications where sequential learning is required.
3. Examples:
1. Game-playing AI (e.g., AlphaGo)
2. Robotics and autonomous vehicle navigation
4. Common Algorithms: Q-Learning, Deep Q Networks (DQN), and Policy Gradient
Methods.
Additionally, some hybrid approaches exist, such as Semi-Supervised Learning (combining labelled
and unlabelled data) and Self-Supervised Learning (where the model generates its own labels).
7. List out the differences between a model and pattern. Patterns are local and model is global
for entire dataset.-Justify
Differences Between a Model and a Pattern

Feature Model Pattern

A mathematical or computational representation


A repeated or recognizable structure
Definition used to make predictions or decisions based on
found within a dataset.
data.

Local – exists in specific portions of


Scope Global – applies to the entire dataset.
the dataset.

Generalizes insights from data to make Identifies relationships, trends, or


Function
predictions on new data. anomalies in subsets of data.

A regression equation predicting house prices A common trend where high-income


Examples
from features. individuals prefer a certain product.

Learning Requires training using algorithms to fit data Can be detected through statistical
Process and minimize errors. analysis or visualization techniques.

Justification: Patterns are Local, and Models are Global


Patterns are local because they are found in specific subsets of data rather than applying universally.
For example, in sales data, a seasonal increase in demand for winter clothing is a pattern observed in
specific months but not applicable throughout the year.
Models, on the other hand, are global because they capture relationships within the entire dataset and
make predictions for new, unseen data. A trained machine learning model for sales forecasting
considers multiple factors and generalizes insights for all seasons rather than specific patterns in
winter alone.
Thus, while patterns provide insights into localized relationships within data, models generalize these
insights for broader applications.

8. Are classification and clustering the same or different? Justify.

Feature Classification Clustering

A supervised learning technique that assigns An unsupervised learning technique that


Definition data points to predefined categories based groups similar data points into clusters
on labelled training data. without predefined labels.

Learning
Supervised Learning Unsupervised Learning
Type

Predicts the class label of new data points Identifies natural groupings or structures
Objective
based on prior knowledge. in the data.

Requires labelled data for training (i.e., data Works with unlabelled data, discovering
Input Data
with known class labels). patterns without prior labels.
Feature Classification Clustering

A discrete class label (e.g., "Spam" or "Not A cluster assignment, where similar data
Output
Spam"). points are grouped together.

Email spam detection, disease diagnosis, Customer segmentation, image grouping,


Examples
sentiment analysis. anomaly detection.

Decision Trees, Support Vector Machines K-Means, DBSCAN, Hierarchical


Algorithms
(SVM), Neural Networks, Naïve Bayes. Clustering, Gaussian Mixture Models.

Justification: Classification and Clustering are Different


Classification and clustering are different because of their learning approach and purpose.
Classification is supervised and requires labelled training data, while clustering is unsupervised
and finds hidden structures in unlabelled data.
For example, in email filtering:
▪ Classification assigns new emails into "Spam" or "Not Spam" based on past
labelled examples.
▪ Clustering groups emails based on content similarity, without predefined categories
(e.g., grouping promotional, social, and personal emails).
Thus, classification is used when predefined labels exist, whereas clustering is used when data needs
to be explored and grouped without prior knowledge.

9. List out the differences between labelled and unlabelled data.


Differences Between Labelled and Unlabelled Data

Feature Labelled Data Unlabelled Data

Data that includes both input features and Data that contains only input features
Definition
corresponding output labels. without predefined labels.

Used in unsupervised learning, where


Used in supervised learning, where models
Supervision models find patterns and structures on
learn from known output labels.
their own.

Does not require manual labelling,


Data Requires manual labelling, which can be
making it easier to obtain in large
Annotation time-consuming and expensive.
quantities.

Used for tasks like classification and Used for clustering, anomaly
Usage regression (e.g., predicting disease diagnosis detection, and association rule
from medical records). learning.

A dataset where images of cats and dogs are A dataset with customer purchase
Examples
labelled as "Cat" or "Dog." history but no predefined categories.
Feature Labelled Data Unlabelled Data

Decision Trees, Support Vector Machines K-Means Clustering, DBSCAN,


Algorithms
(SVM), Neural Networks. Principal Component Analysis (PCA).

10. Point out the differences between supervised and unsupervised learning.
Differences Between Supervised and Unsupervised Learning

Feature Supervised Learning Unsupervised Learning

A type of machine learning where the A type of machine learning where the
Definition model is trained on labelled data (input- model learns patterns and structures from
output pairs). unlabelled data.

Uses labelled data (each input has a Uses unlabelled data (no predefined
Data Type
corresponding output). outputs).

To map inputs to known outputs and make To find hidden patterns, relationships, or
Objective
accurate predictions. clusters within the data.

Linear Regression, Decision Trees, Random K-Means Clustering, Hierarchical


Algorithms Forest, Neural Networks, Support Vector Clustering, DBSCAN, Principal
Machines (SVM). Component Analysis (PCA).

Email spam detection (Spam/Not Spam), Customer segmentation, Market basket


Examples
Fraud detection, Image classification. analysis, Anomaly detection.

Learning The model learns from past examples and The model discovers structures or
Process generalizes to new data. groupings without prior knowledge.

A predictive model that classifies or A grouping or structure in data without


Outcome
predicts new data points. predefined categories.
11.What are the differences between classification and regression?

1. Definition:

1. Classification: Categorizes data into predefined classes or labels.

2. Regression: Predicts continuous numerical values.

2. Output Type:

1. Classification: Discrete values (e.g., "Spam" or "Not Spam").

2. Regression: Continuous values (e.g., predicting house prices).

3. Algorithms Used:

1. Classification: Decision Trees, SVM, k-NN.

2. Regression: Linear Regression, Polynomial Regression, SVR.

4. Evaluation Metrics:

1. Classification: Accuracy, Precision, Recall, F1-score.

2. Regression: Mean Squared Error (MSE), R-squared, Mean Absolute Error (MAE).

5. Real-World Applications:

1. Classification: Email spam detection, disease diagnosis, sentiment analysis.

2. Regression: Stock price prediction, sales forecasting, weather prediction.

12. what is semi-supervised learning?

Semi-Supervised Learning

Semi-supervised learning is a machine learning approach that lies between supervised and
unsupervised learning. It uses both labeled and unlabeled data to train a model.

Key Points:

6. Uses Labeled & Unlabeled Data → A small amount of labeled data and a large amount of
unlabeled data.

7. Bridges Supervised & Unsupervised Learning → Leverages labeled data to guide learning
from unlabeled data.

8. Cost-Effective → Labeling data is expensive, so semi-supervised learning helps reduce the


need for large labeled datasets.

9. Improves Model Accuracy → More data (even if unlabeled) helps the model generalize
better.

10. Applications → Medical diagnosis, speech recognition, fraud detection, and web content
classification.

Example:
11. Google Photos: It learns from a few labeled images (faces you tag) and then categorizes
other similar faces automatically.

13.List out the differences between reinforced learning and supervised learning.

Differences Between Reinforcement Learning and Supervised Learning

Feature Reinforcement Learning (RL) Supervised Learning (SL)

Learns by interacting with the environment Learns from labeled data with correct
Definition
and receiving rewards. input-output pairs.

Maximizes cumulative rewards through trial Minimizes error by mapping input to


Goal
and error. correct output.

Data
No labeled data, learns from experience. Requires a large labeled dataset.
Requirement

Delayed feedback (reward or penalty after Immediate feedback (correct answer


Feedback Type
an action). provided).

Self-driving cars, game playing (e.g., Spam detection, image classification,


Examples
AlphaGo), robotics. speech recognition.

14. List out important classification and clustering algorithms.


Refer module 1 chapter 1 for answer.

15. How is machine learning linked to Al, Data Science, and Statistics?

Five Major Applications of Machine Learning

1. Healthcare & Medical Diagnosis

1. Disease detection (e.g., cancer diagnosis, diabetic retinopathy).

2. Predicting patient outcomes.

2. Finance & Fraud Detection

1. Credit scoring and loan approval.

2. Detecting fraudulent transactions in banking.

3. E-Commerce & Recommendation Systems


1. Personalized product recommendations (e.g., Amazon, Netflix).

2. Dynamic pricing strategies.

4. Self-Driving Cars & Robotics

1. Autonomous vehicle navigation (Tesla, Waymo).

2. Industrial robots for automation.

5. Natural Language Processing (NLP) & Chatbots

1. Virtual assistants (Siri, Alexa, ChatGPT).

2. Sentiment analysis in social media.


Long Questions
1.Explain in detail the machine learning process model.

2.List out and briefly explain the classification algorithms.


Long Questions
3. List out and briefly explain the unsupervised algorithms
The second kind of learning is by self-instruction. As the name
suggests, there are no supervisor or teacher components. In
the absence of a supervisor or teacher, self-instruction is the
most common kind of learning process. This process of self-
instruction is based on the concept of trial and error.
Here, the program is supplied with objects, but no labels
are defined. The algorithm itself observes the examples
and recognizes patterns based on the principles of
grouping. Grouping is done in ways that similar objects
form the same group.
2 Types of unsupervised algorithms
6. Cluster analysis
7. Dimensional reduction
1.Cluster Analysis
Cluster analysis is an example of unsupervised learning. It aims
to group objects into disjoint clusters or groups. Cluster
analysis clusters objects based on its attributes. All the data
objects of the partitions are similar in some aspect and vary
from the data objects in the other partitions significantly.
Some of the examples of clustering processes are
segmentation of a region of interest in an image,
detection of abnormal growth in a medical image, and
determining clusters of signatures in a gene database.
An example of clustering scheme is shown in Figure 1.9 where
the clustering algorithm takes a set of dogs and cats images
and groups it as two clusters-dogs and cats. It can be observed
that the samples belonging to a cluster are similar and samples
are different radically across clusters.

Some of the key clustering algorithms are:


1.k-means algorithm
2.Hierarchical algorithms

2. Dimensionality Reduction
Dimensionality reduction algorithms are examples of
unsupervised algorithms. It takes a higher dimension data as
input and outputs the data in lower dimension by taking
advantage of the variance of the data. It is a task of reducing
the dataset with few features without losing the generality.

Numerical Problems
1. Let us assume a regression algorithm generates a model
y = 0.54 +0.66 x for data pertaining to week sales data of
a product. Here, x is the week and y is the product sales.
Find the prediction for the 5th and 8th week.
Solution:
y = 0.54 +0.66 x

Prediction for 5th week:


x=5
y = 0.54 + 0.66(5)
y = 0.54 + 3.3
y = 3.84
The predicted sales for the 5th week is 3.84

Prediction for 8th week:


x=8
y = 0.54 + 0.66(8)
y = 0.54 + 5.28
y = 5.82
The predicted sales for the 8th week is 5.82
NUMERICALS (2-4)
2. Give two examples of patterns and models
Patterns in machine learning refer to regularities, structures, or trends
that can be identified from data and used for making predictions or decisions.
Here are some common examples of patterns in different areas of machine
learning
2. Supervised Learning Patterns
3. Regression Patterns
4. Classification Patterns
5. Unsupervised Learning Patterns
1. Clustering Patterns
2. Dimensionality Reduction Patterns
These models are trained on labelled data, where both input features and
the target (output) are provided. The model learns the relationship between
the features and the target.
3. Linear Models
1. Linear Regression
2. Logistic Regression
3. Decision Tree
• Decision Tree Classifier
• Decision Tree Regressor

3. Five applications of Machine Learning

• Healthcare
• Disease Diagnosis and Prediction
• Finance
• Fraud Detection
• Retail and E-commerce
• Recommendation Systems
• Transportation and Autonomous Vehicles
• Autonomous Driving
• Natural Language Processing (NLP)
• Sentiment Analysis

4. Five products that use Machine Learning


1. Amazon Alexa
2. Netflix
3. Tesla Autopilot
4. Spotify
5. Google Photos

You might also like