0% found this document useful (0 votes)
9 views10 pages

Mubbashir Assignment ML

ML

Uploaded by

Majeed Mehmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Mubbashir Assignment ML

ML

Uploaded by

Majeed Mehmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Name : Mubbashir Ahmed.

Section : A (Evening)

Roll No : 49

Seat No : 18252043

Teacher : Miss. Noshaba


1. What is Machine Learning?
Answer:
Machine learning is a type of technology that allows computers to learn and
make decisions without being explicitly programmed. Instead of following a
fixed set of instructions, the computer analyzes data, identifies patterns, and
uses these patterns to solve problems or make predictions.

For example:
If you show a machine learning program lots of pictures of cats and dogs, it can
learn to recognize which is which on its own. Over time, as it sees more
examples, it gets better at making accurate guesses.

2. What are the types of Machine Learning?


Answer:
There are three main types of machine learning:

Supervised Learning
 The machine is trained on labeled data, meaning the input data is paired
with the correct output.
 Goal: Learn a mapping from input to output so it can predict outputs for
new, unseen inputs.
 Predicting house prices (input: size, location; output: price).
 Classifying emails as "spam" or "not spam."

Unsupervised Learning
 The machine is given data without labels and must find patterns or
structure within it.
 Goal: Discover hidden patterns or groupings in the data.
 Customer segmentation (grouping customers with similar buying
habits).
 Detecting anomalies like fraud in transactions.

Reinforcement Learning
 The machine learns by interacting with an environment, receiving
rewards or penalties for actions.
 Goal: Find the best strategy to maximize rewards over time.
 Training robots to walk.
 Teaching AI to play games like chess or video games.
3. What is the difference between AI, ML, and
Deep Learning?
Answer:

Aspect AI ML DL
Broad field simulating Subset of Al focused on Subset of ML using neural
Definition intelligence. learning from data. networks.
High-level tasks like Focus on algorithms and Uses large, layered neural
Complexity reasoning. predictions. networks.
Can work without large Needs structured data. Requires vast amounts of
Data Dependency data. data.
Self-driving cars, facial
Examples Chess- playing Al. Fraud detection. recognition.

4. What are features and labels?


Answer:
In machine learning, features and labels are key components used to train a
model:
Features
Features are the input variables or characteristics of the data that the model uses
to make predictions. They represent measurable attributes or properties of the
data.
For predicting house prices:
Features: Size of the house, number of bedrooms, location, age of the house,
etc.
For email spam detection:
Features: Number of links in the email, presence of certain words, email length,
etc.
Purpose:
Features help the model understand patterns in the data.
Labels
Labels are the output or target variable that the model is trying to predict or
classify. They represent the answer or ground truth in the data.
For predicting house prices:
Label: The actual price of the house.
For email spam detection:
Label: Whether the email is spam ("Spam" or "Not Spam").
5. What is over fitting and under fitting?
Answer:
In machine learning, over fitting and under fitting are both common problems
that can cause a model to perform poorly:
Over fitting
A model that performs well on training data but poorly on new data. This can
happen when a model is too complex, or when it learns the details and noise in
the training data.
Under fitting
A model that doesn't perform well on training data, and is unable to generalize
well to new data. This can happen when a model is too simple, or when it hasn't
trained for long enough on a large enough set of data.
The goal in machine learning is to create a model that generalizes well to new
data, meaning it makes good predictions on both training and new data. A well-
fitted model can quickly identify the dominant trend in both seen and unseen data
sets.
To avoid over fitting, you can stop training the model when the error on the
training data starts to increase. You can also use regularization techniques, which
allow the neural network to learn to reduce its own complexity

6. What are common algorithms used in ML?


Answer:
Here are some common machine learning algorithms, summarized by key points:

1. Supervised Learning Algorithms

Logistic Regression: Used for binary classification.


K-Nearest Neighbors (KNN): Classifies data based on the majority of K nearest
neighbors.
Support Vector Machines (SVM): Finds the hyperplane that best separates
classes.
Decision Trees: Splits data into subsets based on feature values.
Random Forest: Ensemble of decision trees, reduces overfitting.
Gradient Boosting (XGBoost, LightGBM): Sequentially builds trees, each
correcting previous errors.
Naive Bayes: Probabilistic classifier based on feature independence.
Neural Networks: Deep learning model for complex patterns, used in both
classification and regression.
2. Unsupervised Learning Algorithms

K-Means: Partitions data into K clusters based on feature similarity.


Hierarchical Clustering: Builds a tree of clusters, useful for visualizations.
DBSCAN: Density-based clustering that detects arbitrarily shaped clusters.
PCA (Principal Component Analysis): Reduces dimensionality by
transforming features to principal components.

3. Reinforcement Learning Algorithms

Q-Learning: Model-free, learns optimal actions to maximize reward.


Deep Q-Networks (DQN): Deep learning-based Q-Learning for complex
environments.
Policy Gradient Methods: Directly optimizes the policy for decision-making.
4. Ensemble Learning
Bagging: Combines multiple models to reduce variance (e.g., Random Forest).
Boosting: Sequential models that improve predictions (e.g., AdaBoost, Gradient
Boosting).
Stacking: Combines multiple models via a meta-model.

5. Deep Learning

CNN (Convolutional Neural Networks): Used for image and video


recognition.
RNN (Recurrent Neural Networks): Used for sequential data (e.g., text, time
series).
LSTM: A type of RNN that handles long-term dependencies.
GANs (Generative Adversarial Networks): Generates new data by training
two networks (generator & discriminator).
These algorithms are widely used in various ML tasks, from classification and
regression to clustering and reinforcement learning.
7. What is the difference between classification
and regression?
Answer:
The main difference between classification and regression is that classification
predicts discrete categories or classes, while regression predicts continuous
quantities:
Classification
The goal is to assign input data to specific categories. The output is typically a
label or class from a set of predefined options. For example, classification can
predict whether an email is spam or not.
Regression
The goal is to establish a relationship between input variables and the output.
The output is a real-valued number that can vary within a range. For example,
regression can predict a person's height based on their weight, gender, diet, or
subject major.

8. Describe a Linear Regression model with


example?
Answer:
A Linear Regression model is used to predict a continuous target variable
(dependent variable) based on one or more input features (independent
variables). It assumes a linear relationship between the variables.
Key Equation:
For simple linear regression (one feature):
Y = β0 + β1 x
Where:
• y = predicted value (target),
• x = feature (independent variable),
• β0 = intercept (value when x = 0),
• β1 = coefficient (slope, how much y changes for a unit change in x).
Example: Predicting House Price
Suppose we want to predict house price based on its size (in sq. ft.):
Data: Size = [1500, 1800, 2400, 3000] Price = [250,000,300,000,350,000,500,000]
Model: Price = β0 + β1 × Size
After training, assume the model gives:
Price= 50,000 + 100 × Size

Prediction for a house of size 2200 sq. ft. :

Price = 50,000 + 100 × 2200 = 270,000


Advantages:
1: Simple and interpretable.
2: Efficient for small datasets.
Limitations:
1: Assumes linear relationships, which may not always be true.
2: Sensitive to outliers.
Linear regression is widely used for problems involving continuous outcomes,
such as predicting prices or measurements based on other variables.

9. What is a training set and a test set?


Answer:
In machine learning, a training set and a test set are two subsets of data used to
develop and evaluate models, respectively. Here's a breakdown of each:

1. Training Set
The training set is used to train the machine learning model. It consists of input
data along with the correct labels (for supervised learning). The model learns
patterns, relationships, or representations from this data to make predictions.
Role: During training, the model adjusts its internal parameters to minimize
errors in its predictions based on the training data.
Composition: It typically makes up the larger portion of the available dataset,
often 70%-80% of the total data.
Example: In a spam email classifier, the training set would contain labeled
examples of emails, some marked as spam and others as not spam. The model
uses these examples to learn how to identify new emails as either spam or not.

2. Test Set
The test set is used to evaluate the performance of the trained model. It contains
data that the model has not seen before during training. This helps to simulate
how the model will perform on new, unseen data.
Role: By testing the model on this separate set of data, we get an indication of
its generalization ability—how well it can make predictions on data it wasn't
trained on.
Composition: It typically makes up the remaining 20%-30% of the total dataset.
Example: After training the spam email classifier on the training set, you would
test the model's accuracy by applying it to the test set, which contains emails the
model has never seen before.
10. What is feature scaling?
Answer:
In machine learning, "feature scaling" refers to the process of transforming
numerical features in a dataset to a common scale or range, ensuring that all
features contribute equally to the model by adjusting their values to be within a
similar range, typically between 0 and 1 or a standardized distribution centered
around 0, which is crucial for algorithms that rely on distance calculations like
K-Nearest Neighbors or algorithms using gradient descent like neural networks.

11. Difference between supervised and


unsupervised machine learning with example?
Answer:

Supervised Learning Unsupervised Learning

Input Data Uses Known and Labeled Uses Unknown Data as input
Data as input

Computational Complexity Less Computational More Computational Complex


Complexity

Real-Time Uses off-line analysis Uses Real-Time Analysis of Data


The number of Classes is
Number of Classes known The number of Classes is not known
Accurate and Reliable Results Moderate Accurate and Reliable
Accuracy of Results Results

Output data The desired output is given. The desired, output is not given.

Model In supervised learning it is not In unsupervised learning it is possible


possible to learn larger and to learn larger and more complex
more complex models than in models than in supervised learning
unsupervised learning

Training data In supervised learning training In unsupervised learning training data


data is used to infer model is not used.

Another name Supervised learning is also Unsupervised learning is also called


called classification. clustering.

Test of model We can test our model. We cannot test our model.

Example Optical Character Find a face in an image.


Recognition
12. Describe KNN Algorithm with example?
Answer:
The k-Nearest Neighbors (k-NN) algorithm is a simple, non-parametric, and
versatile machine learning algorithm used for classification and regression tasks.
It relies on the principle that similar data points are often located near each other
in feature space.

How k-NN Works:


Training Phase:
K-NN does not explicitly build a model during training. Instead, it memorizes
the training dataset.
Prediction Phase:
For a new input point:
Calculate the distance between the input point and all points in the training
dataset. Common distance metrics include Euclidean distance, Manhattan
distance, and Hamming distance.
Identify the k-nearest neighbors to the input point.
For classification, the class is assigned based on the majority vote of the k-nearest
neighbors. For regression, the prediction is the average of the neighbors' values.
Example: Classification with k-NN
Suppose we have the following dataset of animals:
Weight (kg) Height (cm) Class
40 50 Dog
30 45 Dog
70 90 Horse
80 100 Horse
50 60 Dog

Goal: Predict the class of an animal with weight = 60 kg and height = 75 cm


using k = 3.
1. Calculate distances:
Distance formula:
Euclidean Distance = √(x2-x1)2 + (y2 - У1) 2
Compute the distance of the new point (60, 75) from each data point:
Το (40, 50): ✓(60-40)2 + (75 - 50)2 = 31.62
Το (30, 45): ✓(60-30)2 + (75 – 45)2 = 42.43
Το (70, 90): √(60-70)2 + (75 – 90)2 = 18.03
Το (80, 100): √(60-80)2 + (75 – 100)2 = 31.62
Το (50, 60): ✓(60-50)2 + (75 – 60)2 = 18.03
2. Find the k (3) nearest neighbors:
The nearest points are (70, 90), (50, 60), and either (40, 50) or (80, 100) (tie).
3. Classify based on majority voting:
Classes of the neighbors: Horse, Dog, Dog.
Majority class: Dog.
Prediction: The new animal is classified as a Dog.

13. What are some common metrics to evaluate ML models?


Answer:
Common metrics used to evaluate machine learning models include: accuracy,
precision, recall, F1-score, mean squared error (MSE), mean absolute error
(MAE), root mean squared error (RMSE), area under the ROC curve (AUC), log
loss, and confusion matrix; with the choice of metric depending on the type of
problem (classification or regression) and the data distribution (balanced or
imbalanced).
For Classification Problems:

 Accuracy: Overall proportion of correct predictions.


 Precision: Proportion of positive predictions that are actually positive.
 Recall: Proportion of actual positive cases that are correctly identified.
 F1-Score: Harmonic mean of precision and recall, balancing both
metrics.
 Confusion Matrix: A table summarizing prediction results on a
classification problem.
 ROC Curve (Receiver Operating Characteristic): Visualizes the
trade-off between true positive rate and false positive rate.
 AUC-ROC (Area under the Curve): A numerical measure of the ROC
curve's performance.

For Regression Problems:

 Mean Squared Error (MSE): Average squared difference between


predicted and actual values.
 Root Mean Squared Error (RMSE): Square root of MSE, providing a
more interpretable error measure.
 Mean Absolute Error (MAE): Average absolute difference between
predicted and actual values.
 R-squared (R²): Proportion of variance in the dependent variable
explained by the model.
 Adjusted R-squared: R-squared adjusted for the number of predictors,
penalizing for over fitting.

You might also like