0% found this document useful (0 votes)
11 views24 pages

ML Mid1 Myans

MACHINE LEANRINIG

Uploaded by

divya25715
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views24 pages

ML Mid1 Myans

MACHINE LEANRINIG

Uploaded by

divya25715
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

1.List and elaborate on the issues of machine Learning.

1) Lack Of Quality Data

One of the main issues in Machine Learning is the absence of good data. While upgrading, algorithms
tend to make developers exhaust most of their time on artificial intelligence.

• Data can be noisy which will result in inaccurate predictions.

• Incorrect or incomplete information can also lead to faulty programming through


Machine Learning.
2) Fault In Credit Card Fraud Detection

Although this AI-driven software helps to successfully detect credit card fraud, there are issues in
Machine Learning that make the process redundant.

3) Getting Bad Recommendations


Proposal engines are quite regular today. While some might be dependable, others may not appear
to provide the necessary results. Machine Learning algorithms tend to only impose what these
proposal engines have suggested.
4) Talent Deficit

Albeit numerous individuals are pulled into the ML business, however, there are still not many
experts who can take complete control of this innovation.

5) Implementation

Organizations regularly have examination engines working with them when they decide to move up
to ML. The usage of fresher ML strategies with existing procedures is a complicated errand.

6) Making The Wrong Assumptions

ML models can’t manage datasets containing missing data points. Thus, highlights that contain a
huge part of missing data should be erased.

7) Deficient Infrastructure
ML requires a tremendous amount of data stirring abilities. Inheritance frameworks can’t deal with
the responsibility and clasp under tension.
8) Having Algorithms Become Obsolete When Data Grows

ML algorithms will consistently require a lot of data when being trained. Frequently, these ML
algorithms will be trained over a specific data index and afterwards used to foresee future data, a
cycle which you can only expect with a significant amount of effort.
9) Absence Of Skilled Resources

The other issues in Machine Learning are that deep analytics and ML in their present structures are
still new technologies.

10) Customer Segmentation


Let us consider the data of human behaviour by a user during a time for testing and the relevant
previous practices. All things considered, an algorithm is necessary to recognize those customers that
will change over to the paid form of a product and those that won’t.

The lists of supervised learning algorithms in ML are:

• Neural Networks
• Naive Bayesian Model
• Classification
• Support Vector Machines
• Regression
• Random Forest Model
11) Complexity

Although Machine Learning and Artificial Intelligence are booming, a majority of these sectors are
still in their experimental phases, actively undergoing a trial and error method.

12) Slow Results


Another one of the most common issues in Machine Learning is the slow-moving program. The
Machine Learning Models are highly efficient bearing accurate results but the said results take time
to be produced.

13) Maintenance

Requisite results for different actions are bound to change and hence the data needed for the same
is different.

14) Concept Drift

This occurs when the target variable changes, resulting in the delivered results being inaccurate. This
forces the decay of the models as changes cannot be easily accustomed to or upgraded.

15) Data Bias

This occurs when certain aspects of a data set need more importance than others.

16) High Chances Of Error

Many algorithms will contain biased programming which will lead to biased datasets. It will not
deliver the right output and produces irrelevant information.

17) Lack Of Explainability

Machine Learning is often termed a “Black box” as deciphering the outcomes from an algorithm is
often complex and sometimes useless.
2. Distinguish between training loss vs testing loss.
3.Build the K-Nearest Neighbor learning algorithm with an example dataset

The *K-Nearest Neighbors (K-NN)* algorithm is a simple, supervised learning algorithm used for both
classification and regression. The basic idea is that given a new data point, the algorithm looks at the
"k" closest data points (neighbors) from the training set and makes a prediction based on majority
voting (classification) or averaging (regression).

### Steps for K-NN Algorithm:

1. *Choose the number of neighbors (K)*.

2. *Calculate the distance* between the new data point and all the data points in the training set.

3. *Select the K-nearest neighbors* (the data points with the smallest distances to the new data
point).

4. *For classification*: Assign the class that is most common among the K-nearest neighbors.

5. *For regression*: Predict the output by averaging the values of the K-nearest neighbors.

### Example Dataset:

Let’s use a simple 2D dataset where each data point has two features (e.g., height and weight), and
we want to classify the data points into two classes: 0 or 1.

#### Dataset:
| Height (cm) | Weight (kg) | Class (0/1) |

|-------------|-------------|-------------|

| 160 | 55 |0 |

| 170 | 65 |0 |

| 180 | 75 |1 |

| 175 | 70 |1 |

| 155 | 50 |0 |

| 165 | 60 |0 |

| 185 | 80 |1 |

| 190 | 85 |1 |

Now, let's implement the K-NN algorithm and use it to classify a new point with the features:

- *Height = 178 cm*

- *Weight = 72 kg*

### Step-by-Step Implementation:

1. *Choose K*:

Let's set \( K = 3 \). This means the algorithm will find the 3 closest data points to the new point.

2. *Calculate the Distance*:

We'll use *Euclidean distance* to measure the distance between the new point and each data
point in the dataset. For two points \((x_1, y_1)\) and \((x_2, y_2)\), the Euclidean distance is given
by:

\[

d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}

\]

- *New Point*: (Height = 178, Weight = 72)

Now, we calculate the distances from this new point to each data point in the dataset:
| Height (cm) | Weight (kg) | Class | Distance to (178, 72) |

|-------------|-------------|-------|-----------------------|

| 160 | 55 |0 | \( \sqrt{(178 - 160)^2 + (72 - 55)^2} = \sqrt{324 + 289} = 24.04 \) |

| 170 | 65 |0 | \( \sqrt{(178 - 170)^2 + (72 - 65)^2} = \sqrt{64 + 49} = 10.63 \) |

| 180 | 75 |1 | \( \sqrt{(178 - 180)^2 + (72 - 75)^2} = \sqrt{4 + 9} = 3.61 \) |

| 175 | 70 |1 | \( \sqrt{(178 - 175)^2 + (72 - 70)^2} = \sqrt{9 + 4} = 3.61 \) |

| 155 | 50 |0 | \( \sqrt{(178 - 155)^2 + (72 - 50)^2} = \sqrt{529 + 484} = 31.24 \)|

| 165 | 60 |0 | \( \sqrt{(178 - 165)^2 + (72 - 60)^2} = \sqrt{169 + 144} = 17.03 \)|

| 185 | 80 |1 | \( \sqrt{(178 - 185)^2 + (72 - 80)^2} = \sqrt{49 + 64} = 11.40 \) |

| 190 | 85 |1 | \( \sqrt{(178 - 190)^2 + (72 - 85)^2} = \sqrt{144 + 169} = 17.03 \)|

3. *Select the K-Nearest Neighbors*:

We now sort the distances and select the K = 3 nearest neighbors:

| Height (cm) | Weight (kg) | Class | Distance to (178, 72) |

|-------------|-------------|-------|-----------------------|

| 180 | 75 |1 | 3.61 |

| 175 | 70 |1 | 3.61 |

| 170 | 65 |0 | 10.63 |

4. *Classify the New Data Point*:

Out of the 3 nearest neighbors:

- 2 neighbors belong to class *1* (180, 175)

- 1 neighbor belongs to class *0* (170)

By majority voting, the new data point (178, 72) is classified as class *1*.

### Final Result:

The new data point with Height = 178 cm and Weight = 72 kg is classified as *Class 1* (based on the
majority of its 3 nearest neighbors
### Summary:

- The K-NN algorithm classifies new data points based on the majority class of the K nearest
neighbors.

- In this example, the new point (178 cm, 72 kg) was classified as class *1* because two out of its
three nearest neighbors were from class 1..

4.Examine the Classification with regression with an example.

classification and regression are two fundamental types of predictive modeling techniques used in
machine learning. Both involve predicting an output variable based on one or more input variables,
but they differ in the type of output they produce:

 Classification is used when the output variable is categorical (e.g., yes/no, spam/not spam).

 Regression is used when the output variable is continuous (e.g., price, temperature).

Example: Predicting House Prices

Let’s examine how regression is applied in a classification context using an example of predicting
whether a house is overpriced or not based on various features.

Step 1: Define the Problem

Suppose we want to classify houses as either "Overpriced" or "Not Overpriced."

 Features (Input Variables):

o Square footage

o Number of bedrooms

o Number of bathrooms

o Location (e.g., zip code)

o Age of the house

o Recent sale prices of similar houses

 Output Variable (Target):

o Class: Overpriced (1) or Not Overpriced (0)

Step 2: Data Collection

Collect data on various houses, including their features and whether they were classified as
overpriced based on their sale price compared to market averages.

Square Footage Bedrooms Bathrooms Location Age Overpriced

2000 3 2 A 5 0

2500 4 3 B 10 1

1800 3 2 A 15 0

3000 5 4 C 1 1
Square Footage Bedrooms Bathrooms Location Age Overpriced

1500 2 1 B 20 0

Step 3: Choose a Classification Algorithm

We can use various algorithms for classification, such as:

 Logistic Regression

 Decision Trees

 Random Forest

 Support Vector Machines (SVM)

For this example, let’s choose Logistic Regression, which is often used for binary classification
problems.

Step 4: Data Preprocessing

 Handling Missing Values: Fill or remove any missing data.

 Encoding Categorical Variables: Convert categorical variables like "Location" into numerical
values (e.g., using one-hot encoding).

 Feature Scaling: Scale numerical features if necessary.

Step 5: Model Training

1. Split the Data: Divide the dataset into training and test sets (e.g., 80% training, 20% test).

2. Train the Model: Use the training set to fit the logistic regression model.

Step 6: Model Evaluation

After training the model, we evaluate its performance using the test set:
Step 7: Interpretation of Results

 Accuracy: Indicates the percentage of correctly classified houses.

 Precision: Measures the accuracy of positive predictions (overpriced).

 Recall: Measures the ability to find all positive instances.

 F1 Score: Harmonic mean of precision and recall.

Conclusion

In this example, we used a regression-like approach (logistic regression) to classify houses as


overpriced or not based on their features. This illustrates how classification can work with a
structured dataset to make binary decisions, showcasing the fundamental distinction between
regression and classification in predictive modeling.

5.What do you mean by Ensemble learning?


6.What are its main challenges for developing ensemble learning?

Challenges and Considerations in Ensemble Learning

Model Selection and Weighting

Selecting the right combination of models to include in the ensemble, determining the optimal
weighting of each model's predictions, and managing the computational resources required to train
and evaluate multiple models simultaneously. Additionally, ensemble learning may not always
improve performance if the individual models are too similar or if the training data has a high degree
of noise. The diversity of the models—in terms of algorithms, feature processing, and data
perspectives—is vital to covering a broader spectrum of data patterns. Optimal weighting of each
model's contribution, often based on performance metrics, is crucial to harnessing their collective
predictive power. Therefore, careful consideration and experimentation are necessary to achieve the
desired results with ensemble learning.

Computational Complexity

Ensemble learning, involving multiple algorithms and feature sets, requires more computational
resources than individual models. While parallel processing offers a solution, orchestrating an
ensemble of models across multiple processors can introduce complexity in both implementation
and maintenance. Also, more computation might not always lead to better performance, especially if
the ensemble is not set up correctly or if the models amplify each other's errors in noisy datasets.

Diversity and Overfitting

Ensemble learning requires diverse models to avoid bias and enhance accuracy. By incorporating
different algorithms, feature sets, and training data, ensemble learning captures a wider range of
patterns, reducing the risk of overfitting and ensuring the ensemble can handle various scenarios and
make accurate predictions in different contexts. Strategies such as cross-validation help in evaluating
the ensemble's consistency and reliability, ensuring the ensemble is robust against different data
scenarios.

Interpretability

Ensemble learning models prioritize accuracy over interpretability, resulting in highly accurate
predictions. However, this trade-off makes the ensemble model more challenging to interpret.
Techniques like feature importance analysis and model introspection can help provide insights but
may not fully demystify the predictions of complex ensembles. the factors contributing to ensemble
models' decision-making, reducing the interpretability challenge.

7.naive bayes classifier.

The Naive Bayes classifier is a probabilistic machine learning model based on Bayes' Theorem, which
assumes that the features (predictors) are independent of each other, given the class label. Despite
this simplifying assumption (hence "naive"), it performs quite well in various real-world applications
such as spam filtering, document classification, and medical diagnosis.

Key Components

1. Bayes' Theorem

Bayes' Theorem provides a way to update the probability estimate for a hypothesis given new
evidence. Mathematically, it's expressed as:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

Where:

 P(A∣B)P(A|B)P(A∣B) is the posterior probability: the probability of the hypothesis AAA being
true given the evidence BBB.

 P(B∣A)P(B|A)P(B∣A) is the likelihood: the probability of the evidence BBB being true given
that the hypothesis AAA is true.

 P(A)P(A)P(A) is the prior probability: the initial probability of the hypothesis AAA before
seeing the evidence.

 P(B)P(B)P(B) is the marginal likelihood: the probability of observing the evidence.

2. Naive Independence Assumption

In a Naive Bayes classifier, we assume that the features (evidence) are conditionally independent
given the class label. This simplifies the computation of the likelihood:

P(X1,X2,…,Xn∣C)=P(X1∣C)⋅P(X2∣C)⋅⋯⋅P(Xn∣C)P(X_1, X_2, \dots, X_n | C) = P(X_1|C) \cdot P(X_2|C) \


cdot \dots \cdot P(X_n|C)P(X1,X2,…,Xn∣C)=P(X1∣C)⋅P(X2∣C)⋅⋯⋅P(Xn∣C)

Where:

 X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn are the features, and

 CCC is the class label.

3. Classification Rule

To classify a new instance X=(X1,X2,…,Xn)X = (X_1, X_2, \dots, X_n)X=(X1,X2,…,Xn), we compute the
posterior probability for each class CkC_kCk and choose the class with the highest probability:

C^=arg⁡max⁡CkP(Ck)∏i=1nP(Xi∣Ck)\hat{C} = \arg\max_{C_k} P(C_k) \prod_{i=1}^n P(X_i | C_k)C^=argCk


maxP(Ck)i=1∏nP(Xi∣Ck)
Where C^\hat{C}C^ is the predicted class.

Types of Naive Bayes Classifiers

There are several types of Naive Bayes classifiers depending on the distribution of the data:

1. Gaussian Naive Bayes: Used when the features are continuous and assumed to follow a
normal (Gaussian) distribution.

2. Multinomial Naive Bayes: Typically used for discrete data such as document classification,
where the features represent counts or frequencies (e.g., word occurrences in text).

3. Bernoulli Naive Bayes: A special case of the Multinomial Naive Bayes used for
binary/Boolean features.

Pros and Cons

Pros:

 Simple and easy to implement.

 Works well with small datasets.

 Performs well in certain domains like text classification.

 Computationally efficient.

Cons:

 Assumes feature independence, which is often unrealistic.

 If a category in the test data is not seen in the training data, it assigns a probability of 0 to
that feature, which can be mitigated by techniques like Laplace Smoothing.

Example in Python (Scikit-Learn)

Here’s a basic implementation of Naive Bayes using the scikit-learn library:

python

Copy code

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

# Load dataset

iris = load_iris()

X, y = iris.data, iris.target
# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Gaussian Naive Bayes classifier

gnb = GaussianNB()

# Train the model

gnb.fit(X_train, y_train)

# Make predictions on the test set

y_pred = gnb.predict(X_test)

# Evaluate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

This example uses the Iris dataset, which contains features like sepal length and petal width, to
predict the species of the flower using a Gaussian Naive Bayes classifier. The model is trained on a
subset of the data, and the accuracy is calculated based on the predictions made on the test set.

Let me know if you'd like to explore specific aspects or concepts!

8. SVM

Support Vector Machine (SVM) is a popular supervised learning algorithm used for classification,
regression, and outlier detection tasks. It is particularly known for its effectiveness in high-
dimensional spaces and when the number of features exceeds the number of samples.

Key Concepts in SVM:

1. Hyperplane: In SVM, a hyperplane is the decision boundary that separates different classes.
In a 2D space, it's a line; in 3D, it’s a plane. For higher dimensions, it's called a hyperplane.

2. Support Vectors: These are the data points that are closest to the decision boundary
(hyperplane). The SVM algorithm tries to find the optimal hyperplane by maximizing the
margin (distance) between the support vectors of different classes.

3. Margin: The margin is the distance between the hyperplane and the closest support vector
from either class. SVM aims to find the hyperplane with the maximum margin, which
ensures better generalization.
4. Kernel Trick: For non-linearly separable data, SVM uses a "kernel function" to transform the
data into a higher-dimensional space where a linear hyperplane can be used to separate the
classes. Common kernel functions include:

o Linear Kernel: Used when data is linearly separable.

o Polynomial Kernel: Used for polynomial boundaries.

o Radial Basis Function (RBF) Kernel: Popular for non-linear data.

5. Soft Margin SVM: If the data is not linearly separable, a "soft margin" SVM allows some
misclassifications to achieve better generalization by introducing a regularization parameter
C, which controls the trade-off between maximizing the margin and minimizing classification
errors.

SVM Use Cases:

 Text classification: SVM is often used in Natural Language Processing (NLP) tasks, such as
spam detection.

 Image recognition: SVM can be applied in image classification tasks, such as facial
recognition.

 Bioinformatics: Used for classifying genes or other biological data.

Pros and Cons of SVM:

Pros:

 Effective in high-dimensional spaces.

 Works well when there’s a clear margin of separation between classes.

 Memory efficient since it uses a subset of training points (support vectors).

Cons:

 Not well-suited for very large datasets.

 Doesn’t perform well when data is noisy or classes are overlapping.

 Choosing the right kernel function can be tricky.

Would you like to explore the SVM implementation in Python or dive into a specific part of SVM
theory?

9. Logistic regression

Logistic Regression is a simple yet powerful statistical method used for binary classification—when
the outcome can take one of two values, like yes/no, spam/not spam, or success/failure. Despite its
name, logistic regression is not actually a regression algorithm; it’s used for classification tasks.

Core Idea
The goal of logistic regression is to model the probability that a given input belongs to a particular
class. For instance, in a spam email classifier, the model predicts the probability that an email is spam
or not.

Why Not Use Linear Regression?

If you try to use linear regression for classification, the predictions might result in any number
(positive or negative), but probabilities should always be between 0 and 1. This is where logistic
regression comes in.

The Logistic Function (Sigmoid Function)

Logistic regression uses a special function called the logistic function (also known as the sigmoid
function) to squeeze the output of a linear equation between 0 and 1, making it suitable for
probability predictions

Step-by-Step Explanation

1. Input Features: The model takes in one or more features (input variables). For example, if we
are predicting whether an email is spam, the features could be things like the number of
links, the presence of certain keywords, or the email length.
Training the Model
The weights w0,w1,…,wnw_0, w_1, \dots, w_nw0,w1,…,wn are learned during training using a
method called maximum likelihood estimation (MLE). This essentially means finding the parameters
that maximize the likelihood of the observed data.

Performance Metrics

For logistic regression, performance can be evaluated using:

 Accuracy: The fraction of correct predictions.

 Precision, Recall, F1-score: Metrics that consider false positives and false negatives, which
are important in imbalanced datasets.

 ROC Curve & AUC: These help visualize how well the model distinguishes between classes
across various thresholds.

Example in Python (Scikit-Learn)

Here’s a basic implementation of logistic regression using the scikit-learn library:

python

Copy code

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# Load dataset

data = load_breast_cancer()

X, y = data.data, data.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Logistic Regression model

log_reg = LogisticRegression(max_iter=1000)

# Train the model

log_reg.fit(X_train, y_train)
# Make predictions on the test set

y_pred = log_reg.predict(X_test)

# Evaluate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

This example uses the breast cancer dataset to predict whether a tumor is malignant or benign. The
accuracy of the model on the test set is then calculated.

Pros and Cons

Pros:

 Simple and interpretable.

 Works well for binary classification.

 Can output probabilities for the classes.

 Computationally efficient and fast to train.

 Does not require a large number of parameters.

Cons:

 Assumes linearity between the input features and the log-odds of the outcome.

 May not perform well with complex relationships unless interactions or transformations are
added.

 Sensitive to outliers, though less so than linear regression.

Conclusion

Logistic regression is a great starting point for classification problems, especially when the
relationship between the input features and the target variable is approximately linear. It's easy to
interpret and can be extended to handle multiple classes (with multinomial logistic regression).

You might also like