0% found this document useful (0 votes)

11 views24 pages

ML Mid1 Myans

MACHINE LEANRINIG

Uploaded by

divya25715

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views24 pages

ML Mid1 Myans

MACHINE LEANRINIG

Uploaded by

divya25715

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

1.List and elaborate on the issues of machine Learning.

1) Lack Of Quality Data

One of the main issues in Machine Learning is the absence of good data. While upgrading, algorithms
tend to make developers exhaust most of their time on artificial intelligence.

• Data can be noisy which will result in inaccurate predictions.

• Incorrect or incomplete information can also lead to faulty programming through

Machine Learning.
2) Fault In Credit Card Fraud Detection

Although this AI-driven software helps to successfully detect credit card fraud, there are issues in
Machine Learning that make the process redundant.

3) Getting Bad Recommendations

Proposal engines are quite regular today. While some might be dependable, others may not appear
to provide the necessary results. Machine Learning algorithms tend to only impose what these
proposal engines have suggested.
4) Talent Deficit

Albeit numerous individuals are pulled into the ML business, however, there are still not many
experts who can take complete control of this innovation.

5) Implementation

Organizations regularly have examination engines working with them when they decide to move up
to ML. The usage of fresher ML strategies with existing procedures is a complicated errand.

6) Making The Wrong Assumptions

ML models can’t manage datasets containing missing data points. Thus, highlights that contain a
huge part of missing data should be erased.

7) Deficient Infrastructure
ML requires a tremendous amount of data stirring abilities. Inheritance frameworks can’t deal with
the responsibility and clasp under tension.
8) Having Algorithms Become Obsolete When Data Grows

ML algorithms will consistently require a lot of data when being trained. Frequently, these ML
algorithms will be trained over a specific data index and afterwards used to foresee future data, a
cycle which you can only expect with a significant amount of effort.
9) Absence Of Skilled Resources

The other issues in Machine Learning are that deep analytics and ML in their present structures are
still new technologies.

10) Customer Segmentation

Let us consider the data of human behaviour by a user during a time for testing and the relevant
previous practices. All things considered, an algorithm is necessary to recognize those customers that
will change over to the paid form of a product and those that won’t.

The lists of supervised learning algorithms in ML are:

• Neural Networks
• Naive Bayesian Model
• Classification
• Support Vector Machines
• Regression
• Random Forest Model
11) Complexity

Although Machine Learning and Artificial Intelligence are booming, a majority of these sectors are
still in their experimental phases, actively undergoing a trial and error method.

12) Slow Results

Another one of the most common issues in Machine Learning is the slow-moving program. The
Machine Learning Models are highly efficient bearing accurate results but the said results take time
to be produced.

13) Maintenance

Requisite results for different actions are bound to change and hence the data needed for the same
is different.

14) Concept Drift

This occurs when the target variable changes, resulting in the delivered results being inaccurate. This
forces the decay of the models as changes cannot be easily accustomed to or upgraded.

15) Data Bias

This occurs when certain aspects of a data set need more importance than others.

16) High Chances Of Error

Many algorithms will contain biased programming which will lead to biased datasets. It will not
deliver the right output and produces irrelevant information.

17) Lack Of Explainability

Machine Learning is often termed a “Black box” as deciphering the outcomes from an algorithm is
often complex and sometimes useless.
2. Distinguish between training loss vs testing loss.
3.Build the K-Nearest Neighbor learning algorithm with an example dataset

The *K-Nearest Neighbors (K-NN)* algorithm is a simple, supervised learning algorithm used for both
classification and regression. The basic idea is that given a new data point, the algorithm looks at the
"k" closest data points (neighbors) from the training set and makes a prediction based on majority
voting (classification) or averaging (regression).

### Steps for K-NN Algorithm:

1. Choose the number of neighbors (K).

2. *Calculate the distance* between the new data point and all the data points in the training set.

3. *Select the K-nearest neighbors* (the data points with the smallest distances to the new data
point).

4. *For classification*: Assign the class that is most common among the K-nearest neighbors.

5. *For regression*: Predict the output by averaging the values of the K-nearest neighbors.

### Example Dataset:

Let’s use a simple 2D dataset where each data point has two features (e.g., height and weight), and
we want to classify the data points into two classes: 0 or 1.

#### Dataset:
| Height (cm) | Weight (kg) | Class (0/1) |

|-------------|-------------|-------------|

| 160 | 55 |0 |

| 170 | 65 |0 |

| 180 | 75 |1 |

| 175 | 70 |1 |

| 155 | 50 |0 |

| 165 | 60 |0 |

| 185 | 80 |1 |

| 190 | 85 |1 |

Now, let's implement the K-NN algorithm and use it to classify a new point with the features:

- Height = 178 cm

- *Weight = 72 kg*

### Step-by-Step Implementation:

1. *Choose K*:

Let's set \( K = 3 \). This means the algorithm will find the 3 closest data points to the new point.

2. Calculate the Distance:

We'll use *Euclidean distance* to measure the distance between the new point and each data
point in the dataset. For two points \((x_1, y_1)\) and \((x_2, y_2)\), the Euclidean distance is given
by:

d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}

- New Point: (Height = 178, Weight = 72)

|-------------|-------------|-------|-----------------------|

| 160 | 55 |0 | \( \sqrt{(178 - 160)^2 + (72 - 55)^2} = \sqrt{324 + 289} = 24.04 \) |

| 170 | 65 |0 | \( \sqrt{(178 - 170)^2 + (72 - 65)^2} = \sqrt{64 + 49} = 10.63 \) |

| 180 | 75 |1 | \( \sqrt{(178 - 180)^2 + (72 - 75)^2} = \sqrt{4 + 9} = 3.61 \) |

| 175 | 70 |1 | \( \sqrt{(178 - 175)^2 + (72 - 70)^2} = \sqrt{9 + 4} = 3.61 \) |

| 155 | 50 |0 | \( \sqrt{(178 - 155)^2 + (72 - 50)^2} = \sqrt{529 + 484} = 31.24 \)|

| 165 | 60 |0 | \( \sqrt{(178 - 165)^2 + (72 - 60)^2} = \sqrt{169 + 144} = 17.03 \)|

| 185 | 80 |1 | \( \sqrt{(178 - 185)^2 + (72 - 80)^2} = \sqrt{49 + 64} = 11.40 \) |

| 190 | 85 |1 | \( \sqrt{(178 - 190)^2 + (72 - 85)^2} = \sqrt{144 + 169} = 17.03 \)|

3. Select the K-Nearest Neighbors:

We now sort the distances and select the K = 3 nearest neighbors:

| Height (cm) | Weight (kg) | Class | Distance to (178, 72) |

|-------------|-------------|-------|-----------------------|

| 180 | 75 |1 | 3.61 |

| 175 | 70 |1 | 3.61 |

| 170 | 65 |0 | 10.63 |

4. Classify the New Data Point:

Out of the 3 nearest neighbors:

- 2 neighbors belong to class 1 (180, 175)

- 1 neighbor belongs to class 0 (170)

By majority voting, the new data point (178, 72) is classified as class *1*.

### Final Result:

The new data point with Height = 178 cm and Weight = 72 kg is classified as *Class 1* (based on the
majority of its 3 nearest neighbors
### Summary:

- The K-NN algorithm classifies new data points based on the majority class of the K nearest
neighbors.

- In this example, the new point (178 cm, 72 kg) was classified as class *1* because two out of its
three nearest neighbors were from class 1..

4.Examine the Classification with regression with an example.

classification and regression are two fundamental types of predictive modeling techniques used in
machine learning. Both involve predicting an output variable based on one or more input variables,
but they differ in the type of output they produce:

 Classification is used when the output variable is categorical (e.g., yes/no, spam/not spam).

 Regression is used when the output variable is continuous (e.g., price, temperature).

Example: Predicting House Prices

Let’s examine how regression is applied in a classification context using an example of predicting
whether a house is overpriced or not based on various features.

Step 1: Define the Problem

Suppose we want to classify houses as either "Overpriced" or "Not Overpriced."

 Features (Input Variables):

o Square footage

o Number of bedrooms

o Number of bathrooms

o Location (e.g., zip code)

o Age of the house

o Recent sale prices of similar houses

 Output Variable (Target):

o Class: Overpriced (1) or Not Overpriced (0)

Step 2: Data Collection

Collect data on various houses, including their features and whether they were classified as
overpriced based on their sale price compared to market averages.

Square Footage Bedrooms Bathrooms Location Age Overpriced

2000 3 2 A 5 0

2500 4 3 B 10 1

1800 3 2 A 15 0

3000 5 4 C 1 1
Square Footage Bedrooms Bathrooms Location Age Overpriced

1500 2 1 B 20 0

Step 3: Choose a Classification Algorithm

We can use various algorithms for classification, such as:

 Logistic Regression

 Decision Trees

 Random Forest

 Support Vector Machines (SVM)

For this example, let’s choose Logistic Regression, which is often used for binary classification
problems.

Step 4: Data Preprocessing

 Handling Missing Values: Fill or remove any missing data.

 Encoding Categorical Variables: Convert categorical variables like "Location" into numerical
values (e.g., using one-hot encoding).

 Feature Scaling: Scale numerical features if necessary.

Step 5: Model Training

1. Split the Data: Divide the dataset into training and test sets (e.g., 80% training, 20% test).

2. Train the Model: Use the training set to fit the logistic regression model.

Step 6: Model Evaluation

After training the model, we evaluate its performance using the test set:
Step 7: Interpretation of Results

 Accuracy: Indicates the percentage of correctly classified houses.

 Precision: Measures the accuracy of positive predictions (overpriced).

 Recall: Measures the ability to find all positive instances.

 F1 Score: Harmonic mean of precision and recall.

Conclusion

In this example, we used a regression-like approach (logistic regression) to classify houses as

overpriced or not based on their features. This illustrates how classification can work with a
structured dataset to make binary decisions, showcasing the fundamental distinction between
regression and classification in predictive modeling.

5.What do you mean by Ensemble learning?

6.What are its main challenges for developing ensemble learning?

Challenges and Considerations in Ensemble Learning

Model Selection and Weighting

Selecting the right combination of models to include in the ensemble, determining the optimal
weighting of each model's predictions, and managing the computational resources required to train
and evaluate multiple models simultaneously. Additionally, ensemble learning may not always
improve performance if the individual models are too similar or if the training data has a high degree
of noise. The diversity of the models—in terms of algorithms, feature processing, and data
perspectives—is vital to covering a broader spectrum of data patterns. Optimal weighting of each
model's contribution, often based on performance metrics, is crucial to harnessing their collective
predictive power. Therefore, careful consideration and experimentation are necessary to achieve the
desired results with ensemble learning.

Computational Complexity

Ensemble learning, involving multiple algorithms and feature sets, requires more computational
resources than individual models. While parallel processing offers a solution, orchestrating an
ensemble of models across multiple processors can introduce complexity in both implementation
and maintenance. Also, more computation might not always lead to better performance, especially if
the ensemble is not set up correctly or if the models amplify each other's errors in noisy datasets.

Diversity and Overfitting

Ensemble learning requires diverse models to avoid bias and enhance accuracy. By incorporating
different algorithms, feature sets, and training data, ensemble learning captures a wider range of
patterns, reducing the risk of overfitting and ensuring the ensemble can handle various scenarios and
make accurate predictions in different contexts. Strategies such as cross-validation help in evaluating
the ensemble's consistency and reliability, ensuring the ensemble is robust against different data
scenarios.

Interpretability

Ensemble learning models prioritize accuracy over interpretability, resulting in highly accurate
predictions. However, this trade-off makes the ensemble model more challenging to interpret.
Techniques like feature importance analysis and model introspection can help provide insights but
may not fully demystify the predictions of complex ensembles. the factors contributing to ensemble
models' decision-making, reducing the interpretability challenge.

7.naive bayes classifier.

The Naive Bayes classifier is a probabilistic machine learning model based on Bayes' Theorem, which
assumes that the features (predictors) are independent of each other, given the class label. Despite
this simplifying assumption (hence "naive"), it performs quite well in various real-world applications
such as spam filtering, document classification, and medical diagnosis.

Key Components

1. Bayes' Theorem

Bayes' Theorem provides a way to update the probability estimate for a hypothesis given new
evidence. Mathematically, it's expressed as:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

Where:

 P(A∣B)P(A|B)P(A∣B) is the posterior probability: the probability of the hypothesis AAA being
true given the evidence BBB.

 P(B∣A)P(B|A)P(B∣A) is the likelihood: the probability of the evidence BBB being true given
that the hypothesis AAA is true.

 P(A)P(A)P(A) is the prior probability: the initial probability of the hypothesis AAA before
seeing the evidence.

 P(B)P(B)P(B) is the marginal likelihood: the probability of observing the evidence.

2. Naive Independence Assumption

In a Naive Bayes classifier, we assume that the features (evidence) are conditionally independent
given the class label. This simplifies the computation of the likelihood:

P(X1,X2,…,Xn∣C)=P(X1∣C)⋅P(X2∣C)⋅⋯⋅P(Xn∣C)P(X_1, X_2, \dots, X_n | C) = P(X_1|C) \cdot P(X_2|C) \

cdot \dots \cdot P(X_n|C)P(X1,X2,…,Xn∣C)=P(X1∣C)⋅P(X2∣C)⋅⋯⋅P(Xn∣C)

Where:

 X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn are the features, and

 CCC is the class label.

3. Classification Rule

To classify a new instance X=(X1,X2,…,Xn)X = (X_1, X_2, \dots, X_n)X=(X1,X2,…,Xn), we compute the
posterior probability for each class CkC_kCk and choose the class with the highest probability:

C^=arg⁡max⁡CkP(Ck)∏i=1nP(Xi∣Ck)\hat{C} = \arg\max_{C_k} P(C_k) \prod_{i=1}^n P(X_i | C_k)C^=argCk

maxP(Ck)i=1∏nP(Xi∣Ck)
Where C^\hat{C}C^ is the predicted class.

Types of Naive Bayes Classifiers

There are several types of Naive Bayes classifiers depending on the distribution of the data:

1. Gaussian Naive Bayes: Used when the features are continuous and assumed to follow a
normal (Gaussian) distribution.

2. Multinomial Naive Bayes: Typically used for discrete data such as document classification,
where the features represent counts or frequencies (e.g., word occurrences in text).

3. Bernoulli Naive Bayes: A special case of the Multinomial Naive Bayes used for
binary/Boolean features.

Pros and Cons

Pros:

 Simple and easy to implement.

 Works well with small datasets.

 Performs well in certain domains like text classification.

 Computationally efficient.

Cons:

 Assumes feature independence, which is often unrealistic.

 If a category in the test data is not seen in the training data, it assigns a probability of 0 to
that feature, which can be mitigated by techniques like Laplace Smoothing.

Example in Python (Scikit-Learn)

Here’s a basic implementation of Naive Bayes using the scikit-learn library:

python

Copy code

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

# Load dataset

iris = load_iris()

X, y = iris.data, iris.target
# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Gaussian Naive Bayes classifier

gnb = GaussianNB()

# Train the model

gnb.fit(X_train, y_train)

# Make predictions on the test set

y_pred = gnb.predict(X_test)

# Evaluate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

This example uses the Iris dataset, which contains features like sepal length and petal width, to
predict the species of the flower using a Gaussian Naive Bayes classifier. The model is trained on a
subset of the data, and the accuracy is calculated based on the predictions made on the test set.

Let me know if you'd like to explore specific aspects or concepts!

8. SVM

Support Vector Machine (SVM) is a popular supervised learning algorithm used for classification,
regression, and outlier detection tasks. It is particularly known for its effectiveness in high-
dimensional spaces and when the number of features exceeds the number of samples.

Key Concepts in SVM:

1. Hyperplane: In SVM, a hyperplane is the decision boundary that separates different classes.
In a 2D space, it's a line; in 3D, it’s a plane. For higher dimensions, it's called a hyperplane.

2. Support Vectors: These are the data points that are closest to the decision boundary
(hyperplane). The SVM algorithm tries to find the optimal hyperplane by maximizing the
margin (distance) between the support vectors of different classes.

3. Margin: The margin is the distance between the hyperplane and the closest support vector
from either class. SVM aims to find the hyperplane with the maximum margin, which
ensures better generalization.
4. Kernel Trick: For non-linearly separable data, SVM uses a "kernel function" to transform the
data into a higher-dimensional space where a linear hyperplane can be used to separate the
classes. Common kernel functions include:

o Linear Kernel: Used when data is linearly separable.

o Polynomial Kernel: Used for polynomial boundaries.

o Radial Basis Function (RBF) Kernel: Popular for non-linear data.

5. Soft Margin SVM: If the data is not linearly separable, a "soft margin" SVM allows some
misclassifications to achieve better generalization by introducing a regularization parameter
C, which controls the trade-off between maximizing the margin and minimizing classification
errors.

SVM Use Cases:

 Text classification: SVM is often used in Natural Language Processing (NLP) tasks, such as
spam detection.

 Image recognition: SVM can be applied in image classification tasks, such as facial
recognition.

 Bioinformatics: Used for classifying genes or other biological data.

Pros and Cons of SVM:

Pros:

 Effective in high-dimensional spaces.

 Works well when there’s a clear margin of separation between classes.

 Memory efficient since it uses a subset of training points (support vectors).

Cons:

 Not well-suited for very large datasets.

 Doesn’t perform well when data is noisy or classes are overlapping.

 Choosing the right kernel function can be tricky.

Would you like to explore the SVM implementation in Python or dive into a specific part of SVM
theory?

9. Logistic regression

Logistic Regression is a simple yet powerful statistical method used for binary classification—when
the outcome can take one of two values, like yes/no, spam/not spam, or success/failure. Despite its
name, logistic regression is not actually a regression algorithm; it’s used for classification tasks.

Core Idea
The goal of logistic regression is to model the probability that a given input belongs to a particular
class. For instance, in a spam email classifier, the model predicts the probability that an email is spam
or not.

Why Not Use Linear Regression?

If you try to use linear regression for classification, the predictions might result in any number
(positive or negative), but probabilities should always be between 0 and 1. This is where logistic
regression comes in.

The Logistic Function (Sigmoid Function)

Logistic regression uses a special function called the logistic function (also known as the sigmoid
function) to squeeze the output of a linear equation between 0 and 1, making it suitable for
probability predictions

Step-by-Step Explanation

1. Input Features: The model takes in one or more features (input variables). For example, if we
are predicting whether an email is spam, the features could be things like the number of
links, the presence of certain keywords, or the email length.
Training the Model
The weights w0,w1,…,wnw_0, w_1, \dots, w_nw0,w1,…,wn are learned during training using a
method called maximum likelihood estimation (MLE). This essentially means finding the parameters
that maximize the likelihood of the observed data.

Performance Metrics

For logistic regression, performance can be evaluated using:

 Accuracy: The fraction of correct predictions.

 Precision, Recall, F1-score: Metrics that consider false positives and false negatives, which
are important in imbalanced datasets.

 ROC Curve & AUC: These help visualize how well the model distinguishes between classes
across various thresholds.

Example in Python (Scikit-Learn)

Here’s a basic implementation of logistic regression using the scikit-learn library:

python

Copy code

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# Load dataset

data = load_breast_cancer()

X, y = data.data, data.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Logistic Regression model

log_reg = LogisticRegression(max_iter=1000)

# Train the model

log_reg.fit(X_train, y_train)
# Make predictions on the test set

y_pred = log_reg.predict(X_test)

# Evaluate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

This example uses the breast cancer dataset to predict whether a tumor is malignant or benign. The
accuracy of the model on the test set is then calculated.

Pros and Cons

Pros:

 Simple and interpretable.

 Works well for binary classification.

 Can output probabilities for the classes.

 Computationally efficient and fast to train.

 Does not require a large number of parameters.

Cons:

 Assumes linearity between the input features and the log-odds of the outcome.

 May not perform well with complex relationships unless interactions or transformations are
added.

 Sensitive to outliers, though less so than linear regression.

Conclusion

Logistic regression is a great starting point for classification problems, especially when the
relationship between the input features and the target variable is approximately linear. It's easy to
interpret and can be extended to handle multiple classes (with multinomial logistic regression).

K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
ML Lecture#2
No ratings yet
ML Lecture#2
70 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Lecture 5-KNN
No ratings yet
Lecture 5-KNN
55 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
ML 4
No ratings yet
ML 4
33 pages
ML Copy
No ratings yet
ML Copy
33 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Lab 8
No ratings yet
Lab 8
7 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
KNN Classification - K-Nearest Neighbor Classification - MATLAB
No ratings yet
KNN Classification - K-Nearest Neighbor Classification - MATLAB
12 pages
KMEANS
No ratings yet
KMEANS
9 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
ML Notes
100% (2)
ML Notes
125 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Lab Session 9
No ratings yet
Lab Session 9
2 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
W1
No ratings yet
W1
15 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
ML Lecture For School Students
No ratings yet
ML Lecture For School Students
8 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
Dictionary - Programs Questions and Answers - Class 11
No ratings yet
Dictionary - Programs Questions and Answers - Class 11
17 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
KNN Solution
No ratings yet
KNN Solution
2 pages
The Design Process & The Role of CAD
100% (1)
The Design Process & The Role of CAD
12 pages
EUROCOD 5 - Design of Timber Structures - General Rules
100% (1)
EUROCOD 5 - Design of Timber Structures - General Rules
72 pages
Huawei SinlgeSDB HSS9860-BE Feature Description
No ratings yet
Huawei SinlgeSDB HSS9860-BE Feature Description
26 pages
Crude Oil Emulsions A State-Of-The-Art Review
100% (3)
Crude Oil Emulsions A State-Of-The-Art Review
11 pages
Yio Chu Kang Secondary Sec 1 SA2 2020 Science
No ratings yet
Yio Chu Kang Secondary Sec 1 SA2 2020 Science
21 pages
Varela 1979
No ratings yet
Varela 1979
14 pages
P235GH Engl PDF
No ratings yet
P235GH Engl PDF
4 pages
Show Pro sm192m DMX CONTROLLER User Manual
No ratings yet
Show Pro sm192m DMX CONTROLLER User Manual
10 pages
Clock
No ratings yet
Clock
13 pages
MST Mid - 1
No ratings yet
MST Mid - 1
26 pages
Haque 2008 - Durability Design in The African Concrete Code
No ratings yet
Haque 2008 - Durability Design in The African Concrete Code
17 pages
Albert Einstein
No ratings yet
Albert Einstein
19 pages
Ultrapac 2000 Standard, Ultrapac 2000 Superplus, Mini (Typ 0005 Bis 0025)
No ratings yet
Ultrapac 2000 Standard, Ultrapac 2000 Superplus, Mini (Typ 0005 Bis 0025)
3 pages
1000-4 European Union EN12975
No ratings yet
1000-4 European Union EN12975
26 pages
Screenshot 2020-08-05 at 3.32.42 PM
No ratings yet
Screenshot 2020-08-05 at 3.32.42 PM
1 page
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
No ratings yet
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
26 pages
Electroválvula Honeywell TN UR
No ratings yet
Electroválvula Honeywell TN UR
20 pages
Mip Report
No ratings yet
Mip Report
22 pages
WJEC GCSE Maths Intermediate Paper 2 November 2022
No ratings yet
WJEC GCSE Maths Intermediate Paper 2 November 2022
24 pages
Math 2
No ratings yet
Math 2
17 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
49 pages
MST Mid - 1
No ratings yet
MST Mid - 1
1 page
DATASCIENCE
No ratings yet
DATASCIENCE
2 pages
Dbms (Exp 1,2
No ratings yet
Dbms (Exp 1,2
2 pages
IVECO Daily E6 Van Spec Sheet
No ratings yet
IVECO Daily E6 Van Spec Sheet
8 pages
Fractional Fourier Transform
No ratings yet
Fractional Fourier Transform
28 pages
Activity Fluid Machinery
No ratings yet
Activity Fluid Machinery
1 page
Multi Class Logistic Regression Training and Testing
No ratings yet
Multi Class Logistic Regression Training and Testing
9 pages
STF5 Equilibrium Beam Datasheet
No ratings yet
STF5 Equilibrium Beam Datasheet
2 pages
14 Slide
No ratings yet
14 Slide
44 pages
Physics Statistical Mechanics N Solid State Physics
No ratings yet
Physics Statistical Mechanics N Solid State Physics
4 pages
Q1 (25pt.) Q2 (25pt.) Q3 (25pt.) Q4 (25pt.) Total (100pt.) : Instructor: Dr. Moayed Almobaied, Ph.D. Control & Automation
No ratings yet
Q1 (25pt.) Q2 (25pt.) Q3 (25pt.) Q4 (25pt.) Total (100pt.) : Instructor: Dr. Moayed Almobaied, Ph.D. Control & Automation
4 pages
Tension 13: 5or1 He T TH Ro No H RD in
No ratings yet
Tension 13: 5or1 He T TH Ro No H RD in
1 page
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

ML Mid1 Myans

Uploaded by

ML Mid1 Myans

Uploaded by

1.List and elaborate on the issues of machine Learning.

1) Lack Of Quality Data

• Data can be noisy which will result in inaccurate predictions.

• Incorrect or incomplete information can also lead to faulty programming through

3) Getting Bad Recommendations

6) Making The Wrong Assumptions

10) Customer Segmentation

The lists of supervised learning algorithms in ML are:

12) Slow Results

14) Concept Drift

15) Data Bias

16) High Chances Of Error

17) Lack Of Explainability

### Steps for K-NN Algorithm:

1. *Choose the number of neighbors (K)*.

### Example Dataset:

- *Height = 178 cm*

### Step-by-Step Implementation:

2. *Calculate the Distance*:

d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}

- *New Point*: (Height = 178, Weight = 72)

| 160 | 55 |0 | \( \sqrt{(178 - 160)^2 + (72 - 55)^2} = \sqrt{324 + 289} = 24.04 \) |

| 170 | 65 |0 | \( \sqrt{(178 - 170)^2 + (72 - 65)^2} = \sqrt{64 + 49} = 10.63 \) |

| 180 | 75 |1 | \( \sqrt{(178 - 180)^2 + (72 - 75)^2} = \sqrt{4 + 9} = 3.61 \) |

| 175 | 70 |1 | \( \sqrt{(178 - 175)^2 + (72 - 70)^2} = \sqrt{9 + 4} = 3.61 \) |

| 155 | 50 |0 | \( \sqrt{(178 - 155)^2 + (72 - 50)^2} = \sqrt{529 + 484} = 31.24 \)|

| 165 | 60 |0 | \( \sqrt{(178 - 165)^2 + (72 - 60)^2} = \sqrt{169 + 144} = 17.03 \)|

| 185 | 80 |1 | \( \sqrt{(178 - 185)^2 + (72 - 80)^2} = \sqrt{49 + 64} = 11.40 \) |

| 190 | 85 |1 | \( \sqrt{(178 - 190)^2 + (72 - 85)^2} = \sqrt{144 + 169} = 17.03 \)|

3. *Select the K-Nearest Neighbors*:

We now sort the distances and select the K = 3 nearest neighbors:

| Height (cm) | Weight (kg) | Class | Distance to (178, 72) |

4. *Classify the New Data Point*:

Out of the 3 nearest neighbors:

- 2 neighbors belong to class *1* (180, 175)

- 1 neighbor belongs to class *0* (170)

### Final Result:

4.Examine the Classification with regression with an example.

Example: Predicting House Prices

Step 1: Define the Problem

Suppose we want to classify houses as either "Overpriced" or "Not Overpriced."

 Features (Input Variables):

o Location (e.g., zip code)

o Age of the house

o Recent sale prices of similar houses

 Output Variable (Target):

o Class: Overpriced (1) or Not Overpriced (0)

Step 2: Data Collection

Square Footage Bedrooms Bathrooms Location Age Overpriced

Step 3: Choose a Classification Algorithm

We can use various algorithms for classification, such as:

 Support Vector Machines (SVM)

Step 4: Data Preprocessing

 Handling Missing Values: Fill or remove any missing data.

 Feature Scaling: Scale numerical features if necessary.

Step 5: Model Training

Step 6: Model Evaluation

 Accuracy: Indicates the percentage of correctly classified houses.

 Precision: Measures the accuracy of positive predictions (overpriced).

 Recall: Measures the ability to find all positive instances.

 F1 Score: Harmonic mean of precision and recall.

In this example, we used a regression-like approach (logistic regression) to classify houses as

5.What do you mean by Ensemble learning?

Challenges and Considerations in Ensemble Learning

Model Selection and Weighting

Diversity and Overfitting

7.naive bayes classifier.

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

 P(B)P(B)P(B) is the marginal likelihood: the probability of observing the evidence.

2. Naive Independence Assumption

P(X1,X2,…,Xn∣C)=P(X1∣C)⋅P(X2∣C)⋅⋯⋅P(Xn∣C)P(X_1, X_2, \dots, X_n | C) = P(X_1|C) \cdot P(X_2|C) \

 X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn are the features, and

 CCC is the class label.

C^=arg⁡max⁡CkP(Ck)∏i=1nP(Xi∣Ck)\hat{C} = \arg\max_{C_k} P(C_k) \prod_{i=1}^n P(X_i | C_k)C^=argCk

Types of Naive Bayes Classifiers

Pros and Cons

1. Choose the number of neighbors (K).

- Height = 178 cm

2. Calculate the Distance:

- New Point: (Height = 178, Weight = 72)

3. Select the K-Nearest Neighbors:

4. Classify the New Data Point:

- 2 neighbors belong to class 1 (180, 175)

- 1 neighbor belongs to class 0 (170)