Module 1 (ML)
Module 1 (ML)
Chapter
Business organizations use huge amount of data for their daily activities. Earlier, the full potential of
this data was not utilized due to two reasons. One reason was data being scattered across different
archive systems and organizations not being able to integrate these sources fully. Secondly, the lack of
awareness about software tools that could help to unearth the useful information from data. Now
business organizations have started to use the latest technology, machine learning.
2. List out the factors that drive the popularity of machine learning.
1. Availability of Large Volumes of Data – Companies like Facebook, Twitter, and YouTube
generate vast amounts of data, which grows exponentially every year. This abundance of
data provides a strong foundation for training machine learning models.
2. Reduction in Storage and Hardware Costs – The cost of data storage and computational
hardware has significantly decreased, making it easier and more affordable to collect,
process, store, and analyze large datasets.
3. Advancements in Machine Learning Algorithms – The development of complex algorithms,
especially with the rise of deep learning, has enabled machines to perform complex tasks
such as image recognition, natural language processing, and autonomous decision-making.
3. What is a model?
A model in machine learning is a mathematical representation of a system that learns patterns from
data and makes predictions or decisions based on new inputs.
1. Data – Raw, unprocessed facts and figures. It can be in the form of numbers, text, images, or
other formats, but by itself, it has no meaningful interpretation.
Example: A list of sales transactions.
2. Information – Processed data that provides meaning. It includes patterns, relationships, and
associations derived from raw data.
Example: Monthly sales totals computed from daily sales transactions.
3. Knowledge – A deeper understanding derived from information. It involves recognizing
trends, making predictions, and drawing useful insights.
Example: Historical sales data showing seasonal trends in customer purchases.
4. Intelligence – The actionable application of knowledge to make decisions or automate
processes. It involves reasoning and applying insights effectively.
Example: A machine learning model predicting future sales based on past trends and
recommending inventory adjustments.
5. What is Machine leraning?How is machine learning linked to Al, Data Science, and
Statistics?
Machine learning is closely linked to Artificial Intelligence (AI), Data Science, and Statistics, as
it serves as a key component in each field Definition:
Machine learning is an important sub-branch of Artificial intelligence. According to Arthur
Samuel,Machine learning is the” field of study that gives the computers ability to learn
without being Explicitly programmed”.
According to Tom Mitchell,”A computer program is said to learn from experience E,with
respect to task T and some performance measure P,If its performance on T measured by P
improves with Experience E”
2. Early AI focused on logic-based reasoning, but its progress was slow, leading to
periods known as "AI winters."
3. Big Data: Machine learning processes vast datasets characterized by volume, variety,
and velocity to perform tasks like language translation and image recognition.
4. Data Mining: Data mining and machine learning overlap, but while data mining
focuses on extracting hidden patterns, machine learning applies these patterns for
predictive modeling.
Machine learning serves as a bridge between AI, data science, and statistics, enabling intelligent
decision-making, data-driven insights, and predictive analytics across various industries.
Learning Requires training using algorithms to fit data Can be detected through statistical
Process and minimize errors. analysis or visualization techniques.
Learning
Supervised Learning Unsupervised Learning
Type
Predicts the class label of new data points Identifies natural groupings or structures
Objective
based on prior knowledge. in the data.
Requires labelled data for training (i.e., data Works with unlabelled data, discovering
Input Data
with known class labels). patterns without prior labels.
Feature Classification Clustering
A discrete class label (e.g., "Spam" or "Not A cluster assignment, where similar data
Output
Spam"). points are grouped together.
Data that includes both input features and Data that contains only input features
Definition
corresponding output labels. without predefined labels.
Used for tasks like classification and Used for clustering, anomaly
Usage regression (e.g., predicting disease diagnosis detection, and association rule
from medical records). learning.
A dataset where images of cats and dogs are A dataset with customer purchase
Examples
labelled as "Cat" or "Dog." history but no predefined categories.
Feature Labelled Data Unlabelled Data
10. Point out the differences between supervised and unsupervised learning.
Differences Between Supervised and Unsupervised Learning
A type of machine learning where the A type of machine learning where the
Definition model is trained on labelled data (input- model learns patterns and structures from
output pairs). unlabelled data.
Uses labelled data (each input has a Uses unlabelled data (no predefined
Data Type
corresponding output). outputs).
To map inputs to known outputs and make To find hidden patterns, relationships, or
Objective
accurate predictions. clusters within the data.
Learning The model learns from past examples and The model discovers structures or
Process generalizes to new data. groupings without prior knowledge.
1. Definition:
2. Output Type:
3. Algorithms Used:
4. Evaluation Metrics:
2. Regression: Mean Squared Error (MSE), R-squared, Mean Absolute Error (MAE).
5. Real-World Applications:
Semi-Supervised Learning
Semi-supervised learning is a machine learning approach that lies between supervised and
unsupervised learning. It uses both labeled and unlabeled data to train a model.
Key Points:
6. Uses Labeled & Unlabeled Data → A small amount of labeled data and a large amount of
unlabeled data.
7. Bridges Supervised & Unsupervised Learning → Leverages labeled data to guide learning
from unlabeled data.
9. Improves Model Accuracy → More data (even if unlabeled) helps the model generalize
better.
10. Applications → Medical diagnosis, speech recognition, fraud detection, and web content
classification.
Example:
11. Google Photos: It learns from a few labeled images (faces you tag) and then categorizes
other similar faces automatically.
13.List out the differences between reinforced learning and supervised learning.
Learns by interacting with the environment Learns from labeled data with correct
Definition
and receiving rewards. input-output pairs.
Data
No labeled data, learns from experience. Requires a large labeled dataset.
Requirement
15. How is machine learning linked to Al, Data Science, and Statistics?
2. Dimensionality Reduction
Dimensionality reduction algorithms are examples of
unsupervised algorithms. It takes a higher dimension data as
input and outputs the data in lower dimension by taking
advantage of the variance of the data. It is a task of reducing
the dataset with few features without losing the generality.
Numerical Problems
1. Let us assume a regression algorithm generates a model
y = 0.54 +0.66 x for data pertaining to week sales data of
a product. Here, x is the week and y is the product sales.
Find the prediction for the 5th and 8th week.
Solution:
y = 0.54 +0.66 x
• Healthcare
• Disease Diagnosis and Prediction
• Finance
• Fraud Detection
• Retail and E-commerce
• Recommendation Systems
• Transportation and Autonomous Vehicles
• Autonomous Driving
• Natural Language Processing (NLP)
• Sentiment Analysis