0% found this document useful (0 votes)

14 views30 pages

AIML

The document provides an overview of unsupervised and supervised learning in machine learning, detailing techniques like K-Means clustering and Principal Component Analysis (PCA), along with their applications and challenges. It also covers various algorithms such as Linear Regression, Logistic Regression, and Decision Trees, highlighting their advantages and disadvantages. Additionally, it discusses statistical concepts, exploratory data analysis, and Python libraries relevant to machine learning.

Uploaded by

Shukla Aayush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views30 pages

AIML

Uploaded by

Shukla Aayush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Tab 1

UNIT : - 5

Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is trained on unlabeled
data. Unlike supervised learning, where the model learns from labeled data, unsupervised
learning finds patterns, relationships, or structures in data without explicit guidance.

Problems in Unsupervised Learning

1. No Clear Accuracy Measure – Unlike supervised learning, there is no straightforward

way to evaluate the model's accuracy.

2. Difficult to Interpret – Since there are no predefined labels, understanding and
interpreting the results can be challenging.

3. Scalability Issues – Many unsupervised learning algorithms, such as clustering,

struggle with large datasets due to computational complexity.

4. Overfitting – Without labeled data, models may overfit to noise rather than learning
meaningful patterns.

K-Means Clustering

K-Means is an unsupervised clustering algorithm used to group data into K clusters. It works
by:

1. Selecting K random centroids.

2. Assigning each data point to the nearest centroid.

3. Recalculating centroids based on the assigned data points.

4. Repeating steps 2-3 until centroids stop changing or a maximum number of iterations is
reached.

Use Cases:

● Customer segmentation
● Anomaly detection

● Image compression

Challenges:

● Choosing the right K value is difficult.

● Sensitive to outliers.

● May converge to local minima.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique used in machine learning to transform a dataset

into a lower-dimensional space while preserving as much variance as possible.

Steps of PCA:

1. Standardize the dataset.

2. Compute the covariance matrix.

3. Find the eigenvalues and eigenvectors.

4. Select the top principal components.

5. Transform the data into the new feature space.

Applications:

● Image compression

● Noise reduction

● Feature extraction

Advantages:
● Reduces computational cost.

● Removes correlation among features.

Disadvantages:

● Loss of interpretability.

● Can discard useful information.

Different Libraries of Python for Machine Learning

Python offers many libraries for machine learning, including:

1. NumPy – Provides support for large multidimensional arrays and numerical
computations.

2. Pandas – Used for data manipulation and analysis.

3. Matplotlib & Seaborn – Used for data visualization.

4. Scikit-learn – Provides simple and efficient tools for machine learning, including
classification, regression, and clustering.

5. TensorFlow & PyTorch – Used for deep learning and neural networks.

6. Keras – A high-level API for deep learning, built on TensorFlow.

7. Statsmodels – Used for statistical modeling and hypothesis testing.

Would you like any of these topics explained in more detail? 😊

UNIT :- 4

Supervised Learning

Supervised learning is a type of machine learning where a model is trained on labeled data. The
algorithm learns from input-output pairs and makes predictions on new data.
Problems in Supervised Learning

1. Requires Labeled Data – Labeling data is expensive and time-consuming.

2. Overfitting – The model may learn noise instead of actual patterns.

3. Computational Cost – Large datasets require high processing power.

4. Bias in Data – If the training data is biased, the model may make incorrect predictions.

5. Limited Generalization – The model may not perform well on unseen data.

Classification vs. Regression

Feature Classification Regression

Definition Predicts discrete categories (e.g., Predicts continuous values (e.g.,

spam or not spam). house price).

Output Type Categorical (labels). Continuous (numerical values).

Example Logistic Regression, Decision Trees, Linear Regression, Polynomial

Algorithms SVM. Regression.

Use Cases Fraud detection, sentiment analysis. Stock price prediction,

temperature forecasting.

Linear Regression

Linear Regression is a regression algorithm that models the relationship between independent
(X) and dependent (Y) variables using a straight line:

Y=mX+bY = mX + b

where:

● mm = slope (coefficient)

● bb = intercept

Applications:
● House price prediction

● Sales forecasting

Limitations:

● Assumes a linear relationship.

● Sensitive to outliers.

Logistic Regression

Logistic Regression is a classification algorithm used to predict categorical outcomes. Instead

of a straight line, it uses the sigmoid function:

P(Y)=11+e−(mX+b)P(Y) = \frac{1}{1 + e^{-(mX + b)}}

Applications:

● Spam detection

● Medical diagnosis

Advantages:

● Simple and effective for binary classification.

● Outputs probabilities.

Disadvantages:

● Doesn't work well for non-linear relationships.

● Sensitive to outliers.

Polynomial Regression
Polynomial Regression is an extension of Linear Regression where the relationship between
variables is non-linear. It fits a polynomial equation:

Y=a0+a1X+a2X2+a3X3+...+anXnY = a_0 + a_1X + a_2X^2 + a_3X^3 + ... + a_nX^n

Applications:

● Weather prediction

● Stock market analysis

Advantages:

● Captures non-linear relationships.

Disadvantages:

● Overfitting with high-degree polynomials.

Decision Tree

A Decision Tree is a tree-like model used for both classification and regression. It splits data
based on feature conditions.

How It Works:

1. Select the best feature to split data (using Gini impurity or entropy).

2. Split the dataset into subsets.

3. Repeat until reaching a stopping condition (e.g., max depth).

Advantages:

● Easy to understand.

● Handles both numerical and categorical data.

Disadvantages:
● Prone to overfitting.

● Unstable with small data changes.

Random Forest

Random Forest is an ensemble learning method that uses multiple decision trees to improve
accuracy.

How It Works:

1. Create multiple decision trees using random subsets of data.

2. Combine the outputs using majority voting (for classification) or averaging (for
regression).

Advantages:

● Reduces overfitting.

● Handles missing values well.

Disadvantages:

● Computationally expensive.

● Hard to interpret compared to a single tree.

Naïve Bayes

Naïve Bayes is a probabilistic classifier based on Bayes' Theorem:

P(A∣B)=P(B∣A)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}

It assumes that features are independent, which simplifies calculations.

Applications:
● Spam filtering

● Sentiment analysis

Advantages:

● Fast and efficient.

● Works well with small datasets.

Disadvantages:

● Assumes feature independence (which may not always be true).

Support Vector Machine (SVM)

SVM is a classification algorithm that finds the optimal hyperplane to separate data points.

Key Concepts:

● Margin: The distance between the hyperplane and the closest points.

● Kernel Trick: Allows SVM to handle non-linear data.

Applications:

● Image classification

● Text categorization

Advantages:

● Works well with high-dimensional data.

● Robust against overfitting.

Disadvantages:
● Computationally expensive for large datasets.

● Difficult to tune hyperparameters.

Would you like more details on any of these? 😊

UNIT :- 3

Concept of Probability and Its Types

Probability measures the likelihood of an event occurring, represented as a number between 0

and 1.

P(A)=Favorable OutcomesTotal OutcomesP(A) = \frac{\text{Favorable Outcomes}}{\text{Total

Outcomes}}

Types of Probability:

1. Classical Probability – Assumes all outcomes are equally likely. (e.g., rolling a fair die)

2. Empirical Probability – Based on observations from experiments.

3. Subjective Probability – Based on personal judgment or intuition.

4. Conditional Probability – The probability of event A occurring given that B has already
happened.

Descriptive vs. Inferential Statistics

Feature Descriptive Statistics Inferential Statistics

Definition Summarizes and organizes data. Draws conclusions from data.

Technique Mean, median, mode, standard Hypothesis testing, confidence

s deviation. intervals.

Purpose Describes a dataset. Makes predictions about a population.

Example Average height of students in a class. Predicting election results from a
sample.

Types of Inferential Statistics

1. Estimation

○ Point Estimation – A single value estimate (e.g., sample mean).

○ Interval Estimation – A range of values (e.g., confidence intervals).

2. Hypothesis Testing

○ Null Hypothesis (H0H_0) – No effect or relationship exists.

○ Alternative Hypothesis (H1H_1) – A significant effect exists.

○ Uses t-tests, chi-square tests, ANOVA, etc.

3. Regression Analysis – Determines relationships between variables.

4. ANOVA (Analysis of Variance) – Compares multiple group means.

5. Chi-Square Test – Tests relationships between categorical variables.

Random Variables and Its Types

A random variable represents numerical outcomes of a random experiment.

Types:

1. Discrete Random Variable – Takes countable values (e.g., number of heads in coin
flips).

2. Continuous Random Variable – Takes infinite values within a range (e.g.,
temperature).
Central Limit Theorem (CLT) and Its Rules

The Central Limit Theorem (CLT) states that the distribution of the sample mean approaches a
normal distribution as the sample size increases, regardless of the population distribution.

Rules of CLT:

1. The sample size should be sufficiently large (n≥30n \geq 30).

2. The population can be of any distribution, but the sample mean will be approximately
normal.

3. The mean of the sample distribution equals the population mean (μ\mu).

4. The standard deviation of the sample mean is given by:

σXˉ=σn\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}

Sampling Distribution and Its Types

A sampling distribution is the probability distribution of a statistic based on repeated samples

from a population.

Types:

1. Sampling Distribution of the Mean – Distribution of sample means.

2. Sampling Distribution of the Proportion – Distribution of sample proportions.

3. t-Distribution – Used when the sample size is small.

4. Chi-Square Distribution – Used for variance estimation.

Cross-Validation and Its Types

Cross-validation is a technique used to evaluate machine learning models by splitting data into
training and testing sets multiple times.

Types:

1. K-Fold Cross-Validation – Splits data into K subsets and trains the model K times.
2. Leave-One-Out Cross-Validation (LOO-CV) – Each observation is used as a test set
while the rest form the training set.

3. Stratified K-Fold – Ensures each fold has the same class proportion.

4. Time-Series Cross-Validation – Used for time-dependent data, preserving

chronological order.

Bayes’ Theorem and Its Importance

Bayes’ Theorem describes the probability of an event based on prior knowledge of related
conditions.

P(A∣B)=P(B∣A)P(A)P(B)P(A|B) = \frac{P(B|A) P(A)}{P(B)}

Importance:

1. Used in spam filtering (probability of spam given specific words).

2. Applied in medical diagnosis (probability of a disease given symptoms).

3. Essential for machine learning models in probabilistic reasoning.

4. Forms the foundation of Naïve Bayes classifiers.

Would you like detailed examples for any of these topics? 😊

UNIT : - 2

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is the process of analyzing datasets to summarize their main
characteristics, often using visualizations and statistical techniques.

Steps in EDA:

1. Understanding Data – Checking data types, missing values, and distributions.
2. Summary Statistics – Computing measures like mean, median, and standard deviation.

3. Visualizations – Using histograms, scatter plots, box plots, etc.

4. Handling Outliers – Identifying and managing extreme values.

5. Correlation Analysis – Checking relationships between variables.

Importance:

● Helps detect patterns and trends.

● Identifies missing values and outliers.

● Guides feature selection for machine learning.

Descriptive Statistics

Descriptive statistics summarize and organize data without drawing conclusions.

Types of Descriptive Statistics:

1. Measures of Central Tendency – Mean, Median, Mode.

2. Measures of Dispersion – Range, Variance, Standard Deviation.

3. Measures of Shape – Skewness and Kurtosis.

Difference Between Data and Histogram

Feature Data Histogram

Definition Raw collection of facts and figures. A graphical representation of data

distribution.

Representatio Stored in tables, spreadsheets, Displayed using bars to represent

n databases. frequency.

Example List of students' ages. A bar chart showing age distribution.

Purpose Used for processing, analysis, and Used to visualize frequency
storage. distributions.

3Ms (Mean, Median, Mode)

The 3Ms are measures of central tendency that describe the "center" of data.

1. Mean (Average)

Mean=∑XnMean = \frac{\sum X}{n}
○ Affected by outliers.

○ Used for numerical data with normal distribution.

2. Median (Middle Value)

○ The middle value when data is sorted.

○ Not affected by outliers.

3. Mode (Most Frequent Value)

○ The most frequently occurring value in a dataset.

○ Used for categorical data.

Measure of Dispersion

Measures of dispersion describe how spread out the data is.

Types:

1. Range – Difference between the highest and lowest value.

Range=Max−MinRange = \text{Max} - \text{Min}
2. Variance – Measures how far data points deviate from the mean.
σ2=∑(X−μ)2n\sigma^2 = \frac{\sum (X - \mu)^2}{n}
3. Standard Deviation – Square root of variance, gives spread in original units.
σ=∑(X−μ)2n\sigma = \sqrt{\frac{\sum (X - \mu)^2}{n}}
4. Interquartile Range (IQR) – Measures spread within the middle 50% of data.
IQR=Q3−Q1IQR = Q3 - Q1
5. Coefficient of Variation (CV) – Compares spread between different datasets.
CV=Standard DeviationMean×100CV = \frac{\text{Standard Deviation}}{\text{Mean}}
\times 100

5-Number Summary (Box Plot Summary)

A 5-number summary describes key characteristics of a dataset using:

1. Minimum – Smallest value in the dataset.

2. First Quartile (Q1) – 25th percentile.

3. Median (Q2) – 50th percentile.

4. Third Quartile (Q3) – 75th percentile.

5. Maximum – Largest value in the dataset.

Box Plot Components:

● Box – Represents IQR (middle 50% of data).

● Whiskers – Extend to minimum and maximum (excluding outliers).

● Outliers – Plotted as individual points beyond whiskers.

Importance:

● Helps visualize data spread and skewness.

● Identifies outliers easily.

Would you like detailed examples or Python code for these concepts? 😊
UNIT:-1

1. Definition of AI, Applications & Explanation of One

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that can
learn, reason, problem-solve, and make decisions.

Applications of AI:

1. Healthcare – AI diagnoses diseases, predicts patient outcomes, and assists in drug
discovery.

2. Finance – Fraud detection, risk assessment, and algorithmic trading.

3. Self-driving Cars – AI powers autonomous vehicles by recognizing objects and making
driving decisions.

4. Chatbots & Virtual Assistants – Used in customer service (e.g., Siri, Alexa).

5. E-commerce & Recommendation Systems – AI suggests products based on user

behavior.

6. Robotics – AI-driven robots automate industrial and household tasks.

✅ Example Explanation: AI in Healthcare

● AI models like IBM Watson analyze medical records to assist doctors.

● AI-based imaging tools detect tumors in MRIs or X-rays.

● AI-powered chatbots provide basic medical guidance to patients.

2. Problem Characteristics of AI

AI problems have unique characteristics that determine how they are solved.

Key Problem Characteristics:

1. Decomposability – Can the problem be broken into smaller subproblems?

2. Ignorability of Steps – Do previous steps matter for the final solution?

3. Solution Type – Is the best solution absolute (fixed) or relative (depends on the
scenario)?
4. State vs. Path Solution – Does solving the problem require a final state or a sequence
of steps?

5. Role of Knowledge – Does solving the problem require domain knowledge?

✅ Analysis of AI Problems:
Problem Decomposabl Ignore Solution State or Role of
e? Steps? Type Path? Knowledge?

8-Puzzle Yes No Relative Path Minimal

Chess Yes No Relative Path High

Tower of Yes No Absolute Path Minimal

Hanoi

3. Difference Between ANN and BNN

Feature Artificial Neural Network (ANN) Biological Neural Network
(BNN)

Definition A computational model mimicking The real neural network in the

human brain neurons. human brain.

Components Neurons, weights, activation functions. Neurons, synapses, axons, and

dendrites.

Learning Type Machine learning algorithms. Learning through experience and

neuroplasticity.

Processing Fast, but limited. Extremely powerful and adaptive.

Speed

Flexibility Can be trained for specific tasks. Can learn new tasks without
retraining.

4. Types of Learning in AI

AI learns from data through different types of learning:

1. Supervised Learning
● Uses labeled data.

● Example: Email spam detection.

2. Unsupervised Learning

● Uses unlabeled data to find patterns.

● Example: Customer segmentation.

3. Reinforcement Learning

● Learns through rewards and penalties.

● Example: AlphaGo (game-playing AI).

4. Semi-Supervised Learning

● Combines both labeled and unlabeled data.

● Example: Google Photos automatically tagging people.

5. Difference Between Supervised and Unsupervised Learning

Feature Supervised Learning Unsupervised Learning

Definition Learns from labeled data. Learns from unlabeled data.

Goal Predicts outcomes. Finds patterns in data.

Example Spam detection. Clustering customers.

Algorithms Decision Trees, SVM, Neural Networks. K-Means, PCA, Autoencoders.

6. Elements of Data Science

1. Data Collection – Gathering raw data from various sources.

2. Data Cleaning – Removing missing values and inconsistencies.

3. Exploratory Data Analysis (EDA) – Understanding data distribution.

4. Feature Engineering – Selecting and transforming features.

5. Model Building – Using ML algorithms for predictions.

6. Evaluation & Deployment – Checking performance and deploying models.

7. Data Visualization Techniques

Data visualization helps interpret complex datasets using graphical representation.

Common Techniques:

1. Bar Chart – Compares categories.

2. Histogram – Shows frequency distribution.

3. Box Plot – Displays distribution and outliers.

4. Scatter Plot – Shows relationships between variables.

5. Heatmap – Represents data density using colors.

Would you like detailed Python code for any of these topics? 😊
PREVIOUS MID SEM PAPER

Here are the answers to your questions:

1. What is AI? List out the types of AI and Explain them in detail. (3 Marks)

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that can
perform tasks that typically require human intelligence, such as problem-solving,
decision-making, learning, and understanding language.

Types of AI
AI is classified into the following types:

1. Based on Capability

○ Narrow AI (Weak AI): Designed for specific tasks (e.g., chatbots,

recommendation systems).

○ General AI (Strong AI): Machines that can perform any intellectual task like
humans (still theoretical).

○ Super AI: Hypothetical AI surpassing human intelligence in all aspects.

2. Based on Functionality

○ Reactive Machines: No memory, only react to situations (e.g., IBM's Deep Blue).

○ Limited Memory: Can use past data for decision-making (e.g., self-driving cars).

○ Theory of Mind AI: Can understand emotions and thoughts (under research).

○ Self-Aware AI: AI with its own consciousness (hypothetical).

2. Differentiate Artificial Intelligence and Machine Learning. (3 Marks)

Feature Artificial Intelligence (AI) Machine Learning (ML)

Definition AI is a broad field that enables ML is a subset of AI that allows machines

machines to mimic human to learn from data without explicit
intelligence. programming.

Purpose Decision-making and Learning from patterns in data

problem-solving

Techniques Includes ML, deep learning, Includes supervised, unsupervised, and

Used expert systems, etc. reinforcement learning

Example Chatbots, Robotics, Self-driving Recommendation systems, Fraud

cars detection

3. Roll two dice and observe two numbers X and Y. (3 Marks)

The sample space for rolling two dice contains 6×6=366 \times 6 = 36 possible outcomes.
(a) Find P(X=2,Y=6)P(X=2, Y=6)

Only one outcome satisfies this condition: (2,6)(2,6).

P(X=2,Y=6)=136P(X=2, Y=6) = \frac{1}{36}

(b) Find P(X>3∣Y=2)P(X>3 \mid Y=2)

Given that Y=2Y=2, the possible values for XX are {1,2,3,4,5,6}.

Favorable cases for X>3X>3 are {4,5,6}, which are 3 cases.
Total cases where Y=2Y=2 are 6.

P(X>3∣Y=2)=36=12P(X>3 \mid Y=2) = \frac{3}{6} = \frac{1}{2}

4. Discuss Poisson Distribution. (3 Marks)

Poisson Distribution models the probability of a given number of events occurring in a fixed
interval of time or space, assuming the events occur independently and at a constant rate.

Formula:

P(X=k)=e−λλkk!P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}

Where:

● kk = number of occurrences

● λ\lambda = average rate of occurrence

● ee = Euler’s number (≈2.718)

Example Application: Number of customer arrivals at a bank per minute.

5. List down applications of AI and Explain one in detail. (2 Marks)

Applications of AI:

● Healthcare (Diagnosis, Medical Imaging, Drug Discovery)

● Finance (Fraud Detection, Algorithmic Trading)

● E-commerce (Personalized Recommendations, Chatbots)

● Automobile (Self-Driving Cars, Traffic Management)

● Education (Automated Grading, Smart Tutors)

Detailed Explanation: AI in Healthcare

AI helps in diagnosing diseases using image analysis (e.g., detecting tumors in MRI scans). It
also assists in predicting disease outbreaks and developing drugs faster. AI-powered chatbots
provide preliminary medical advice, reducing the burden on healthcare professionals.

Q.2 (A) Discuss the difference between descriptive and inferential

statistics. (3 Marks)
Aspect Descriptive Statistics Inferential Statistics

Purpose Summarizes data Makes predictions or

generalizations about a
population

Technique Measures of central tendency (mean, Hypothesis testing, confidence

s median, mode), dispersion (variance, intervals, regression analysis
standard deviation)

Example Average test scores of a class Predicting exam performance

based on a sample

Q.2 (A) State the central limit theorem.

The Central Limit Theorem (CLT) states that, regardless of the population distribution, the
distribution of the sample mean will approach a normal distribution as the sample size increases
(typically n>30n > 30).

If X1,X2,...,Xn are i.i.d. with mean μ and variance σ2, then Xˉ−μσ/n≈N(0,1) as n→∞.\text{If }
X_1, X_2, ..., X_n \text{ are i.i.d. with mean } \mu \text{ and variance } \sigma^2, \text{ then }
\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \approx N(0,1) \text{ as } n \to \infty.

Q.2 (B) Define Exploratory Data Analysis and explain its importance in data
analysis.
Exploratory Data Analysis (EDA) is a statistical approach used to analyze datasets to
summarize key characteristics, identify patterns, and detect anomalies before applying machine
learning models.

Importance:

● Helps in data cleaning and preprocessing

● Identifies missing values and outliers

● Helps understand relationships between variables

● Provides insights for feature engineering

Q.2 (B) Define the Range and explain its calculation methods.

Definition:
The Range is the difference between the maximum and minimum values in a dataset.

Formula:

Range=Max Value−Min Value\text{Range} = \text{Max Value} - \text{Min Value}

Example Calculation:
For dataset {10, 22, 45, 68, 92},

Range=92−10=82\text{Range} = 92 - 10 = 82

Q.2 (C) Calculate the quartiles and find the interquartile range (IQR) for the
given dataset.

Dataset:
18, 34, 68, 22, 10, 92, 46, 52, 38, 29, 45, 37, 10, 30, 50, 70, 90

Step 1: Arrange in Ascending Order

10, 10, 18, 22, 29, 30, 34, 37, 38, 45, 46, 50, 52, 68, 70, 90, 92

Step 2: Calculate Quartiles

● Q1 (First Quartile, 25th Percentile): Median of the first half → Q1=29Q1 = 29
● Q2 (Median, 50th Percentile): Middle value → Q2=38Q2 = 38

● Q3 (Third Quartile, 75th Percentile): Median of the second half → Q3=68Q3 = 68

Step 3: Calculate IQR

IQR=Q3−Q1=68−29=39

Here are the answers to your questions:

Q.2 (C) Find the mean, median, mode, and standard deviation of the given
weights.

Given Data:

x1=3.5x_1 = 3.5, x2=12.3x_2 = 12.3, x3=17.7x_3 = 17.7, x4=20.9x_4 = 20.9, x5=23.1x_5 = 23.1

1. Mean (Average):
Mean=∑xin=3.5+12.3+17.7+20.9+23.15\text{Mean} = \frac{\sum x_i}{n} = \frac{3.5 + 12.3 + 17.7
+ 20.9 + 23.1}{5} =77.55=15.5 kg= \frac{77.5}{5} = 15.5 \text{ kg}

2. Median (Middle Value):

Since we have 5 values (odd number), the median is the middle value:

Median=17.7 kg\text{Median} = 17.7 \text{ kg}

3. Mode (Most Frequent Value):

Since all values are unique, there is no mode.

4. Standard Deviation (σ):

σ=∑(xi−xˉ)2n\sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}

First, calculate deviations from the mean (xˉ=15.5\bar{x} = 15.5):

(3.5−15.5)2=(−12)2=144(3.5 - 15.5)^2 = (-12)^2 = 144 (12.3−15.5)2=(−3.2)2=10.24(12.3 -

15.5)^2 = (-3.2)^2 = 10.24 (17.7−15.5)2=(2.2)2=4.84(17.7 - 15.5)^2 = (2.2)^2 = 4.84
(20.9−15.5)2=(5.4)2=29.16(20.9 - 15.5)^2 = (5.4)^2 = 29.16 (23.1−15.5)2=(7.6)2=57.76(23.1 -
15.5)^2 = (7.6)^2 = 57.76 σ=144+10.24+4.84+29.16+57.765\sigma = \sqrt{\frac{144 + 10.24 +
4.84 + 29.16 + 57.76}{5}} σ=2465=49.2≈7.01 kg\sigma = \sqrt{\frac{246}{5}} = \sqrt{49.2}
\approx 7.01 \text{ kg}
Q.3 (A) Explain the difference between classification and regression
models.
Feature Classification Regression

Definition Assigns labels to data (e.g., cat vs. Predicts continuous values (e.g.,
dog) temperature)

Output Discrete values (e.g., 0 or 1) Continuous values (e.g., 45.6°C)

Example Spam detection (spam/not spam) Predicting house prices

Q.3 (B) Explain Linear Regression with an example.

Linear Regression is a statistical method to predict a continuous variable based on the

relationship between independent (X) and dependent (Y) variables. The equation is:

Y=mX+cY = mX + c

Where:

● YY = dependent variable

● XX = independent variable

● mm = slope

● cc = intercept

Example:
Predicting house prices based on size (sq ft). If:

Price=5000×(Size)+20000\text{Price} = 5000 \times (\text{Size}) + 20000

Then for a house of 1000 sq ft:

Price=(5000×1000)+20000=5,020,000\text{Price} = (5000 \times 1000) + 20000 = 5,020,000

Q.3 (C) Explain the Decision Tree Algorithm with an example.

A Decision Tree is a tree-like model used for classification and regression. It splits data into
branches based on feature values.
Example:
For predicting whether a student will pass an exam:

● If study hours > 3, then pass

● If study hours ≤ 3, then fail

Q.3 (A) Explain Polynomial Regression.

Polynomial Regression is a type of regression where the relationship between independent

and dependent variables is modeled as an nth-degree polynomial:

Y=a0+a1X+a2X2+...+anXnY = a_0 + a_1X + a_2X^2 + ... + a_nX^n

Used when data follows a curved pattern rather than a straight line.

Example: Predicting population growth using a quadratic equation.

Q.3 (B) Discuss the concept of ensemble learning and how it is utilized in
random forests.

Ensemble Learning combines multiple models to improve accuracy.

Random Forest is an ensemble of multiple Decision Trees. Each tree is trained on a random
subset of the data, and the final prediction is based on majority voting (classification) or
averaging (regression).

Advantages:

● Reduces overfitting

● Improves accuracy

Q.3 (C) Explain the concept of Support Vector Machine with an example.

A Support Vector Machine (SVM) is a supervised learning algorithm that finds the best
decision boundary (hyperplane) to classify data.
Example:
For classifying emails as spam or not spam, SVM finds the best boundary between the two
categories.

Q.4 (Attempt any 4 out of 6, Each Question of 3 Marks)

(1) Which Evaluation Metrics do we use for the Classification Problem? Explain any
three.

1. Accuracy – Percentage of correctly classified instances.

2. Precision – Ratio of true positives to total predicted positives.

3. Recall – Ratio of true positives to actual positives.

(2) Differentiate between supervised and unsupervised learning.

Feature Supervised Learning Unsupervised Learning

Labeled Data Uses labeled data Uses unlabeled data

Purpose Classification & regression Clustering & pattern discovery

Example Spam detection Customer segmentation

(3) Define the K-Means algorithm.

K-Means is a clustering algorithm that partitions data into K clusters based on feature similarity.
It minimizes the variance within each cluster.

Steps:

1. Select K cluster centers.

2. Assign each point to the nearest cluster.

3. Update the cluster centers and repeat until convergence.

(4) Challenges and advantages of unsupervised learning compared to supervised

learning.

Advantages:
● No need for labeled data

● Identifies hidden patterns

● Useful for exploratory analysis

Challenges:

● Hard to evaluate results

● May group unrelated data

● Needs fine-tuning

(5) What is Dimensionality Reduction? List the methods to reduce dimensions.

Dimensionality Reduction reduces the number of features while preserving essential

information.

Methods:

1. Principal Component Analysis (PCA)

2. t-SNE (t-Distributed Stochastic Neighbor Embedding)

3. Autoencoders

(6) What is a clustering method? List down the types of clustering.

Clustering is an unsupervised learning technique that groups similar data points together.

Types of Clustering:

1. Partitioning-based (K-Means)

2. Hierarchical (Agglomerative, Divisive)

3. Density-based (DBSCAN)

4. Fuzzy Clustering (Fuzzy C-Means)

😊
This provides detailed answers to your questions. Let me know if you need further explanations!

Classification
100% (2)
Classification
105 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
ML 7th Sem AIML ITE Notes Complete LONG
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG
202 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
ML-Unit - 3 & 4
No ratings yet
ML-Unit - 3 & 4
33 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Machine Learning
No ratings yet
Machine Learning
133 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
DS ML CompleteSlides PDF
No ratings yet
DS ML CompleteSlides PDF
211 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Unit 1
100% (1)
Unit 1
13 pages
Unit 3
No ratings yet
Unit 3
12 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Unit 3 Big Data
No ratings yet
Unit 3 Big Data
50 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
SRS of University Management System by Balwinder Singh Vehgal
0% (1)
SRS of University Management System by Balwinder Singh Vehgal
17 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
4th SUMMATIVE TEST IN ENGLISH 7 - WEEK 7-8
No ratings yet
4th SUMMATIVE TEST IN ENGLISH 7 - WEEK 7-8
3 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
Esquema Eléctrico Motor Chevrolet Optra 1J 1600 Año 2008
No ratings yet
Esquema Eléctrico Motor Chevrolet Optra 1J 1600 Año 2008
50 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
Machine Learning
100% (6)
Machine Learning
115 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Assignment and Project
100% (1)
Assignment and Project
3 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
1 - Supervised Learning & Its Types
No ratings yet
1 - Supervised Learning & Its Types
24 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Supervised Learning
No ratings yet
Supervised Learning
46 pages
User Manual P60
100% (1)
User Manual P60
160 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
Thief of Thieves - Season One Free Download (v1.2.0) IGGGAMES
No ratings yet
Thief of Thieves - Season One Free Download (v1.2.0) IGGGAMES
6 pages
DS - UNIT - III - QB & Ans
No ratings yet
DS - UNIT - III - QB & Ans
25 pages
University Institute of Computing: Big Data Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 22CAH-782
27 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Module 1 & 2
No ratings yet
Module 1 & 2
21 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Summer Training Report About Aerobridges
No ratings yet
Summer Training Report About Aerobridges
36 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
Tutorial 3
No ratings yet
Tutorial 3
30 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
BECS Design Guidelines-3.15.3
No ratings yet
BECS Design Guidelines-3.15.3
106 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Optical Fiber Communication by Sunil S Harakannanavar 1f2795 PDF
No ratings yet
Optical Fiber Communication by Sunil S Harakannanavar 1f2795 PDF
218 pages
AI Lab6
No ratings yet
AI Lab6
7 pages
Platform SAMPLE Project Plan - 7
No ratings yet
Platform SAMPLE Project Plan - 7
10 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Otis Case Study
100% (1)
Otis Case Study
7 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Machine Learning For Beginners PDF
No ratings yet
Machine Learning For Beginners PDF
29 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Prime Number
No ratings yet
Prime Number
1 page
PET/CT: Current Status in India: Ommentary
No ratings yet
PET/CT: Current Status in India: Ommentary
5 pages
About The Virtual Private Network (VPN) - Information Systems & Technology - University of Waterloo
No ratings yet
About The Virtual Private Network (VPN) - Information Systems & Technology - University of Waterloo
4 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
Hashi Collection Sampler
No ratings yet
Hashi Collection Sampler
5 pages
Essentials of Anatomy and Physiology 2nd Edition by Kenneth Saladin, Robin McFarland ISBN 0072965541 9780072965544 Instant Download
No ratings yet
Essentials of Anatomy and Physiology 2nd Edition by Kenneth Saladin, Robin McFarland ISBN 0072965541 9780072965544 Instant Download
34 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
How To Install SSL Certificate On Oracle Linux
No ratings yet
How To Install SSL Certificate On Oracle Linux
3 pages
Tailoring Unit - Final
No ratings yet
Tailoring Unit - Final
26 pages
Unit 3: Bandwidth Utilization
No ratings yet
Unit 3: Bandwidth Utilization
74 pages
Aoc QB Sol
No ratings yet
Aoc QB Sol
48 pages
Aj QB Sol
No ratings yet
Aj QB Sol
43 pages
How To Calculate Statement and Decision Coverage - Pdf-Cdekey - LZVANTIVLN43H7C3VQCJQAY2RYRDVIGZ
No ratings yet
How To Calculate Statement and Decision Coverage - Pdf-Cdekey - LZVANTIVLN43H7C3VQCJQAY2RYRDVIGZ
3 pages
CPF QB
No ratings yet
CPF QB
42 pages
Ds s-23
No ratings yet
Ds s-23
2 pages
Implement Basic Connectivity
No ratings yet
Implement Basic Connectivity
9 pages
Fop-2 s-23
No ratings yet
Fop-2 s-23
3 pages
MAD Practical 02
No ratings yet
MAD Practical 02
4 pages
Chapter 10-DataBase Transaction Final
No ratings yet
Chapter 10-DataBase Transaction Final
7 pages
LSB Based Digital Watermarking Technique
No ratings yet
LSB Based Digital Watermarking Technique
4 pages
Mathematical Induction: A Systematic Technique in Analyzing Mathematical Proofs
No ratings yet
Mathematical Induction: A Systematic Technique in Analyzing Mathematical Proofs
6 pages
Lê Tiến Huy: Objective
No ratings yet
Lê Tiến Huy: Objective
2 pages
Landslide Detection Using Deep Learning and Object-Based Image Analysis
No ratings yet
Landslide Detection Using Deep Learning and Object-Based Image Analysis
12 pages
1) Functional Units of Computer
No ratings yet
1) Functional Units of Computer
6 pages
B.tech D.S s-22
No ratings yet
B.tech D.S s-22
1 page
Design Calculations Book For Pms Shed H-Building: 119'6" X 219'9" X 12' Client: Pakistan Navy
No ratings yet
Design Calculations Book For Pms Shed H-Building: 119'6" X 219'9" X 12' Client: Pakistan Navy
95 pages
Acer Aspire 1800 - COMPAL LA-2271 - TOUCHPAD
No ratings yet
Acer Aspire 1800 - COMPAL LA-2271 - TOUCHPAD
1 page