ML Mid1 Myans
ML Mid1 Myans
One of the main issues in Machine Learning is the absence of good data. While upgrading, algorithms
tend to make developers exhaust most of their time on artificial intelligence.
Although this AI-driven software helps to successfully detect credit card fraud, there are issues in
Machine Learning that make the process redundant.
Albeit numerous individuals are pulled into the ML business, however, there are still not many
experts who can take complete control of this innovation.
5) Implementation
Organizations regularly have examination engines working with them when they decide to move up
to ML. The usage of fresher ML strategies with existing procedures is a complicated errand.
ML models can’t manage datasets containing missing data points. Thus, highlights that contain a
huge part of missing data should be erased.
7) Deficient Infrastructure
ML requires a tremendous amount of data stirring abilities. Inheritance frameworks can’t deal with
the responsibility and clasp under tension.
8) Having Algorithms Become Obsolete When Data Grows
ML algorithms will consistently require a lot of data when being trained. Frequently, these ML
algorithms will be trained over a specific data index and afterwards used to foresee future data, a
cycle which you can only expect with a significant amount of effort.
9) Absence Of Skilled Resources
The other issues in Machine Learning are that deep analytics and ML in their present structures are
still new technologies.
• Neural Networks
• Naive Bayesian Model
• Classification
• Support Vector Machines
• Regression
• Random Forest Model
11) Complexity
Although Machine Learning and Artificial Intelligence are booming, a majority of these sectors are
still in their experimental phases, actively undergoing a trial and error method.
13) Maintenance
Requisite results for different actions are bound to change and hence the data needed for the same
is different.
This occurs when the target variable changes, resulting in the delivered results being inaccurate. This
forces the decay of the models as changes cannot be easily accustomed to or upgraded.
This occurs when certain aspects of a data set need more importance than others.
Many algorithms will contain biased programming which will lead to biased datasets. It will not
deliver the right output and produces irrelevant information.
Machine Learning is often termed a “Black box” as deciphering the outcomes from an algorithm is
often complex and sometimes useless.
2. Distinguish between training loss vs testing loss.
3.Build the K-Nearest Neighbor learning algorithm with an example dataset
The *K-Nearest Neighbors (K-NN)* algorithm is a simple, supervised learning algorithm used for both
classification and regression. The basic idea is that given a new data point, the algorithm looks at the
"k" closest data points (neighbors) from the training set and makes a prediction based on majority
voting (classification) or averaging (regression).
2. *Calculate the distance* between the new data point and all the data points in the training set.
3. *Select the K-nearest neighbors* (the data points with the smallest distances to the new data
point).
4. *For classification*: Assign the class that is most common among the K-nearest neighbors.
5. *For regression*: Predict the output by averaging the values of the K-nearest neighbors.
Let’s use a simple 2D dataset where each data point has two features (e.g., height and weight), and
we want to classify the data points into two classes: 0 or 1.
#### Dataset:
| Height (cm) | Weight (kg) | Class (0/1) |
|-------------|-------------|-------------|
| 160 | 55 |0 |
| 170 | 65 |0 |
| 180 | 75 |1 |
| 175 | 70 |1 |
| 155 | 50 |0 |
| 165 | 60 |0 |
| 185 | 80 |1 |
| 190 | 85 |1 |
Now, let's implement the K-NN algorithm and use it to classify a new point with the features:
- *Weight = 72 kg*
1. *Choose K*:
Let's set \( K = 3 \). This means the algorithm will find the 3 closest data points to the new point.
We'll use *Euclidean distance* to measure the distance between the new point and each data
point in the dataset. For two points \((x_1, y_1)\) and \((x_2, y_2)\), the Euclidean distance is given
by:
\[
\]
Now, we calculate the distances from this new point to each data point in the dataset:
| Height (cm) | Weight (kg) | Class | Distance to (178, 72) |
|-------------|-------------|-------|-----------------------|
|-------------|-------------|-------|-----------------------|
| 180 | 75 |1 | 3.61 |
| 175 | 70 |1 | 3.61 |
| 170 | 65 |0 | 10.63 |
By majority voting, the new data point (178, 72) is classified as class *1*.
The new data point with Height = 178 cm and Weight = 72 kg is classified as *Class 1* (based on the
majority of its 3 nearest neighbors
### Summary:
- The K-NN algorithm classifies new data points based on the majority class of the K nearest
neighbors.
- In this example, the new point (178 cm, 72 kg) was classified as class *1* because two out of its
three nearest neighbors were from class 1..
classification and regression are two fundamental types of predictive modeling techniques used in
machine learning. Both involve predicting an output variable based on one or more input variables,
but they differ in the type of output they produce:
Classification is used when the output variable is categorical (e.g., yes/no, spam/not spam).
Regression is used when the output variable is continuous (e.g., price, temperature).
Let’s examine how regression is applied in a classification context using an example of predicting
whether a house is overpriced or not based on various features.
o Square footage
o Number of bedrooms
o Number of bathrooms
Collect data on various houses, including their features and whether they were classified as
overpriced based on their sale price compared to market averages.
2000 3 2 A 5 0
2500 4 3 B 10 1
1800 3 2 A 15 0
3000 5 4 C 1 1
Square Footage Bedrooms Bathrooms Location Age Overpriced
1500 2 1 B 20 0
Logistic Regression
Decision Trees
Random Forest
For this example, let’s choose Logistic Regression, which is often used for binary classification
problems.
Encoding Categorical Variables: Convert categorical variables like "Location" into numerical
values (e.g., using one-hot encoding).
1. Split the Data: Divide the dataset into training and test sets (e.g., 80% training, 20% test).
2. Train the Model: Use the training set to fit the logistic regression model.
After training the model, we evaluate its performance using the test set:
Step 7: Interpretation of Results
Conclusion
Selecting the right combination of models to include in the ensemble, determining the optimal
weighting of each model's predictions, and managing the computational resources required to train
and evaluate multiple models simultaneously. Additionally, ensemble learning may not always
improve performance if the individual models are too similar or if the training data has a high degree
of noise. The diversity of the models—in terms of algorithms, feature processing, and data
perspectives—is vital to covering a broader spectrum of data patterns. Optimal weighting of each
model's contribution, often based on performance metrics, is crucial to harnessing their collective
predictive power. Therefore, careful consideration and experimentation are necessary to achieve the
desired results with ensemble learning.
Computational Complexity
Ensemble learning, involving multiple algorithms and feature sets, requires more computational
resources than individual models. While parallel processing offers a solution, orchestrating an
ensemble of models across multiple processors can introduce complexity in both implementation
and maintenance. Also, more computation might not always lead to better performance, especially if
the ensemble is not set up correctly or if the models amplify each other's errors in noisy datasets.
Ensemble learning requires diverse models to avoid bias and enhance accuracy. By incorporating
different algorithms, feature sets, and training data, ensemble learning captures a wider range of
patterns, reducing the risk of overfitting and ensuring the ensemble can handle various scenarios and
make accurate predictions in different contexts. Strategies such as cross-validation help in evaluating
the ensemble's consistency and reliability, ensuring the ensemble is robust against different data
scenarios.
Interpretability
Ensemble learning models prioritize accuracy over interpretability, resulting in highly accurate
predictions. However, this trade-off makes the ensemble model more challenging to interpret.
Techniques like feature importance analysis and model introspection can help provide insights but
may not fully demystify the predictions of complex ensembles. the factors contributing to ensemble
models' decision-making, reducing the interpretability challenge.
The Naive Bayes classifier is a probabilistic machine learning model based on Bayes' Theorem, which
assumes that the features (predictors) are independent of each other, given the class label. Despite
this simplifying assumption (hence "naive"), it performs quite well in various real-world applications
such as spam filtering, document classification, and medical diagnosis.
Key Components
1. Bayes' Theorem
Bayes' Theorem provides a way to update the probability estimate for a hypothesis given new
evidence. Mathematically, it's expressed as:
Where:
P(A∣B)P(A|B)P(A∣B) is the posterior probability: the probability of the hypothesis AAA being
true given the evidence BBB.
P(B∣A)P(B|A)P(B∣A) is the likelihood: the probability of the evidence BBB being true given
that the hypothesis AAA is true.
P(A)P(A)P(A) is the prior probability: the initial probability of the hypothesis AAA before
seeing the evidence.
In a Naive Bayes classifier, we assume that the features (evidence) are conditionally independent
given the class label. This simplifies the computation of the likelihood:
Where:
3. Classification Rule
To classify a new instance X=(X1,X2,…,Xn)X = (X_1, X_2, \dots, X_n)X=(X1,X2,…,Xn), we compute the
posterior probability for each class CkC_kCk and choose the class with the highest probability:
There are several types of Naive Bayes classifiers depending on the distribution of the data:
1. Gaussian Naive Bayes: Used when the features are continuous and assumed to follow a
normal (Gaussian) distribution.
2. Multinomial Naive Bayes: Typically used for discrete data such as document classification,
where the features represent counts or frequencies (e.g., word occurrences in text).
3. Bernoulli Naive Bayes: A special case of the Multinomial Naive Bayes used for
binary/Boolean features.
Pros:
Computationally efficient.
Cons:
If a category in the test data is not seen in the training data, it assigns a probability of 0 to
that feature, which can be mitigated by techniques like Laplace Smoothing.
python
Copy code
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and test sets
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
# Evaluate accuracy
This example uses the Iris dataset, which contains features like sepal length and petal width, to
predict the species of the flower using a Gaussian Naive Bayes classifier. The model is trained on a
subset of the data, and the accuracy is calculated based on the predictions made on the test set.
8. SVM
Support Vector Machine (SVM) is a popular supervised learning algorithm used for classification,
regression, and outlier detection tasks. It is particularly known for its effectiveness in high-
dimensional spaces and when the number of features exceeds the number of samples.
1. Hyperplane: In SVM, a hyperplane is the decision boundary that separates different classes.
In a 2D space, it's a line; in 3D, it’s a plane. For higher dimensions, it's called a hyperplane.
2. Support Vectors: These are the data points that are closest to the decision boundary
(hyperplane). The SVM algorithm tries to find the optimal hyperplane by maximizing the
margin (distance) between the support vectors of different classes.
3. Margin: The margin is the distance between the hyperplane and the closest support vector
from either class. SVM aims to find the hyperplane with the maximum margin, which
ensures better generalization.
4. Kernel Trick: For non-linearly separable data, SVM uses a "kernel function" to transform the
data into a higher-dimensional space where a linear hyperplane can be used to separate the
classes. Common kernel functions include:
5. Soft Margin SVM: If the data is not linearly separable, a "soft margin" SVM allows some
misclassifications to achieve better generalization by introducing a regularization parameter
C, which controls the trade-off between maximizing the margin and minimizing classification
errors.
Text classification: SVM is often used in Natural Language Processing (NLP) tasks, such as
spam detection.
Image recognition: SVM can be applied in image classification tasks, such as facial
recognition.
Pros:
Cons:
Would you like to explore the SVM implementation in Python or dive into a specific part of SVM
theory?
9. Logistic regression
Logistic Regression is a simple yet powerful statistical method used for binary classification—when
the outcome can take one of two values, like yes/no, spam/not spam, or success/failure. Despite its
name, logistic regression is not actually a regression algorithm; it’s used for classification tasks.
Core Idea
The goal of logistic regression is to model the probability that a given input belongs to a particular
class. For instance, in a spam email classifier, the model predicts the probability that an email is spam
or not.
If you try to use linear regression for classification, the predictions might result in any number
(positive or negative), but probabilities should always be between 0 and 1. This is where logistic
regression comes in.
Logistic regression uses a special function called the logistic function (also known as the sigmoid
function) to squeeze the output of a linear equation between 0 and 1, making it suitable for
probability predictions
Step-by-Step Explanation
1. Input Features: The model takes in one or more features (input variables). For example, if we
are predicting whether an email is spam, the features could be things like the number of
links, the presence of certain keywords, or the email length.
Training the Model
The weights w0,w1,…,wnw_0, w_1, \dots, w_nw0,w1,…,wn are learned during training using a
method called maximum likelihood estimation (MLE). This essentially means finding the parameters
that maximize the likelihood of the observed data.
Performance Metrics
Precision, Recall, F1-score: Metrics that consider false positives and false negatives, which
are important in imbalanced datasets.
ROC Curve & AUC: These help visualize how well the model distinguishes between classes
across various thresholds.
python
Copy code
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
# Make predictions on the test set
y_pred = log_reg.predict(X_test)
# Evaluate accuracy
This example uses the breast cancer dataset to predict whether a tumor is malignant or benign. The
accuracy of the model on the test set is then calculated.
Pros:
Cons:
Assumes linearity between the input features and the log-odds of the outcome.
May not perform well with complex relationships unless interactions or transformations are
added.
Conclusion
Logistic regression is a great starting point for classification problems, especially when the
relationship between the input features and the target variable is approximately linear. It's easy to
interpret and can be extended to handle multiple classes (with multinomial logistic regression).