ML Notes MAKAUT 7th Sem
ML Notes MAKAUT 7th Sem
and make decisions without being explicitly programmed. Instead of being programmed with specific
instructions, a machine learning system uses algorithms and statistical models to analyze data, identify
patterns, and make
predictions or decisions. Categories Of Machine Learning:
• In simpler terms, machine learning is like
teaching a computer to learn from examples
and experiences, allowing it to improve its
performance over time. The process involves
feeding the system a large amount of data,
and the machine learning algorithm learns
from
this data to recognize patterns and make
predictions or decisions without being explicitly
programmed for each task.
1. Advancements in Deep Learning: Continued progress in deep learning techniques, architectures, and
training methods is expected. This will enable more sophisticated and efficient models for tasks such as
image recognition, natural language processing, and reinforcement learning.
2. Explainable AI (XAI): As AI and ML systems become more prevalent in critical decision-making
processes, there is a growing demand for models that are transparent and explainable. XAI focuses on
making machine learning models more interpretable and understandable, enhancing trust and
accountability.
3. Automated Machine Learning (AutoML): The development of tools and platforms for automated
machine learning is on the rise. AutoML aims to streamline the machine learning pipeline, automating
tasks such as feature engineering, model selection, and hyperparameter tuning, making ML more
accessible to non-experts.
4. Edge Computing and Federated Learning: The integration of machine learning with edge computing
allows for processing data closer to the source, reducing latency and improving efficiency. Federated
learning, where models are trained across decentralized devices, enables privacy-preserving machine
learning by keeping data localized.
5. AI in Healthcare: Machine learning is expected to play a significant role in advancing healthcare, from
personalized medicine and drug discovery to predictive analytics and medical imaging. ML models can
help in diagnosing diseases, predicting patient outcomes, and optimizing treatment plans.
6. Ethics and Responsible AI: As ML systems are deployed in various applications, there is an increasing
focus on ethical considerations and responsible AI practices. This includes addressing issues of bias,
fairness, accountability, and ensuring that AI technologies benefit society as a whole.
7. Natural Language Processing (NLP) Advances: Progress in natural language processing, including
language understanding, generation, and contextual understanding, will lead to more sophisticated
applications in chatbots, virtual assistants, sentiment analysis, and language translation.
8. Continued Integration with Other Technologies: Machine learning will continue to integrate with other
emerging technologies such as blockchain, augmented reality (AR), virtual reality (VR), and the Internet
of Things (IoT), creating synergies and enabling innovative applications.
9. Quantum Machine Learning: Research in quantum computing may impact machine learning in the
future. Quantum computing has the potential to solve complex problems more efficiently than classical
computers, offering new possibilities for machine learning algorithms.
10. Increased Industry Adoption: Industries across the board are recognizing the value of machine learning
in optimizing processes, improving decision-making, and gaining insights from data. Increased adoption
is expected across sectors such as finance, manufacturing, retail, and more.
What is IoT (Internet of Things)? - IoT, or the Internet of Things, refers to the network of physical
devices embedded with sensors, software, and connectivity features that enable them to collect and
exchange data. These devices can range from everyday objects like household appliances and wearable
devices to industrial machinery. The goal of IoT is to enhance efficiency, automation, and decision-
making by enabling these devices to communicate with each other and with centralized systems.
➢ Connection between IoT and Machine Learning: The connection between IoT and machine
learning lies in the ability to analyze and derive meaningful insights from the massive amounts of
data generated by IoT devices. Here's a simplified explanation:
1. Data Collection from IoT Devices: IoT devices generate vast amounts of data as they collect
information from their surroundings. For example, smart thermostats can collect data on
temperature patterns,
fitness trackers can monitor physical activities, and industrial sensors can track machine
performance.
2. Data Processing and Analysis: Machine learning algorithms are applied to process and analyze
the data collected by IoT devices. These algorithms can identify patterns, trends, and anomalies
within the data, providing valuable insights that might be challenging for traditional
programming to uncover.
3. Smart Decision-Making: With the help of machine learning, IoT systems can make intelligent
decisions based on the analyzed data. For instance, a smart home system might learn a user's
preferences and adjust the thermostat automatically, or an industrial IoT system could predict
equipment failures and
schedule maintenance proactively.
4. Adaptive Systems: Machine learning allows IoT systems to adapt and improve over time. As
more data is collected, the algorithms can continuously learn and refine their models, leading to
better predictions and decisions.
In essence, machine learning enhances the capabilities of IoT systems by making them smarter,
more adaptive, and capable of deriving actionable insights from the wealth of data generated by
IoT devices.
➢ ML in Cyber Security: Machine learning is increasingly applied to cybersecurity to enhance
the detection and prevention of cyber threats. Here's a simplified explanation of how
machine learning is used in cybersecurity:
1. Anomaly Detection: Machine learning algorithms can be trained on normal patterns of
behavior within a computer network. When there is an abnormal deviation from these
patterns, the system can flag it as a potential security threat. For example, unusual login times,
atypical data access patterns, or unexpected network traffic could indicate a cyber-attack.
2. Pattern Recognition: Machine learning algorithms excel at recognizing patterns in large
datasets. In cybersecurity, these algorithms can analyze historical data to identify common
attack patterns or signatures associated with known malware or malicious activities.
3. Malware Detection: Machine learning models can be trained to recognize the characteristics of
malware based on features like file behavior, code analysis, or network behavior. This enables
systems to detect and block new, previously unseen malware based on its similarities to known
malicious patterns.
4. Phishing Detection: Phishing attacks often involve deceptive emails or websites designed to
trick users into revealing sensitive information. Machine learning can analyze the content of
emails, URLs, and other communication to identify phishing attempts by recognizing
suspicious patterns and content.
5. User Behavior Analysis: By monitoring and analyzing user behavior, machine learning can
identify unusual activities that may indicate compromised accounts. For example, if a user
suddenly accesses sensitive data they have never accessed before, it could be a sign of
unauthorized access.
6. Automated Response: Machine learning models can enable automated responses to certain
types of cyber threats. For instance, if a system detects a known pattern of attack, it can
automatically trigger actions such as isolating affected devices, blocking malicious IP
addresses, or updating security
configurations.
1. Data storage: Facilities for storing and retrieving huge amounts of data are an important component of
the learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store data
and use cables and other technology to retrieve data.
2. Abstraction: The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves the application of known models
and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been trained, the data
is transformed into an abstract form that summarizes the original information.
3. Generalization: The third component of the learning process is known as generalization.
The term generalization describes the process of turning the knowledge about stored data into a form that
can be utilized for future action. These actions are to be carried out on tasks that are similar, but not
identical to those what have been seen before. In generalization, the goal is to discover those properties of
the data that will be most relevant to future tasks.
4. Evaluation: Evaluation is the last component of the learning process. It is the process of giving feedback
to the user to measure the utility of the learned knowledge. This feedback is then utilized to effect
improvements in the whole learning process.
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained
labelled data. using unlabelled data.
Supervised learning model takes direct feedback to Unsupervised learning model does not take
check if it is predicting correct output or not. any feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.
In supervised learning, input data is provided to the In unsupervised learning, only input data is
model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be used for those cases Unsupervised learning can be used for those
where we know the input as well as corresponding cases where we have only input data and no
outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Artificial Unsupervised learning is closer to the true
intelligence as in this, we first train the model for Artificial Intelligence as it learns similarly as a
each data, and then only it can predict the correct child learns daily routine things by his
output. experiences.
In Regression, the output variable must be of In Classification, the output variable must be a
continuous nature or real value. discrete value.
The task of the regression algorithm is to map The task of the classification algorithm is to map
the input value (x) with the continuous output the input value(x) with the discrete output
variable(y). variable(y).
Input Data are independent variables and Input Data are Independent variables and
continuous dependent variables. categorical dependent variable.
In Regression, we try to find the best fit line, In Classification, we try to find the decision
which can predict the output more accurately. boundary, which can divide the dataset into
different classes.
Regression algorithms can be used to solve the Classification Algorithms can be used to solve
regression problems such as Weather classification problems such as Identification of
Prediction, House price prediction, etc. spam emails, Speech Recognition, Identification of
cancer cells, etc.
The regression Algorithm can be further The Classification algorithms can be divided into
divided into Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.
Decision Trees
o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question and based on the answer
(Yes/No), it further split the tree into subtrees.
o Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
Information Gain: Information gain measures how much a particular feature contributes to
reducing uncertainty in predicting the outcome. In decision trees, it helps decide the order
in which features are used to split the data.
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy (each feature)
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Gini Index: The Gini index quantifies the impurity or disorder of a set of data points. In
decision trees, it is used as a criterion for selecting the best feature to split the data. A lower
Gini index indicates a more pure and homogenous set.
Entropy: Entropy is a measure of disorder or randomness in a set of data. In decision trees,
entropy is used to calculate the information gain. Lower entropy implies a more ordered
and predictable set of data. Gini Index= 1- ∑jPj2
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given set
set of independent variables. of independent variables.
Linear Regression is used for solving Regression Logistic regression is used for solving
problem. Classification problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is used
estimation of accuracy. for estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.
In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable.
Generalized Linear Models (GLMs)
Generalized Linear Models (GLMs) are a class of statistical models that extend linear regression to
accommodate non-normally distributed response variables and apply to a broader range of data
distributions. While GLMs have roots in traditional statistics, they are also used in machine learning for
various tasks.
1. Linear Predictor: Like linear regression, GLMs have a linear combination of input features, but instead
of predicting the outcome directly, this linear combination is transformed using a link function.
2. Link Function: The link function connects the linear predictor to the mean of the response variable. It
defines the relationship between the linear combination of input features and the expected value of
the response variable.
3. Probability Distribution: GLMs can handle different probability distributions for the response variable.
Common distributions include:
• Gaussian (Normal): For continuous outcomes.
• Binomial: For binary outcomes (success/failure).
• Poisson: For count data.
• And others like Gamma, Tweedie, etc.
Steps in a GLM:
1. Define the Model: Specify the form of the linear predictor and choose a link function based on the
nature of the response variable.
2. Estimate Parameters: Use statistical techniques (usually maximum likelihood estimation) to find the
values of the model parameters that maximize the likelihood of the observed data.
3. Link Function Transformation: Apply the link function to transform the linear combination of features
into a prediction that aligns with the distribution of the response variable.
4. Make Predictions: Use the model to make predictions on new data by applying the learned parameters
and the link function.
Example:
Consider a binary classification problem where you want to predict whether an email is spam or not based
on the length of the email. A GLM for this problem might involve a logistic regression model (link function:
logistic) with a binomial distribution.
Advantages of GLMs:
The best hyperplane is that plane that has the maximum distance from
both the classes, and this is the main aim of SVM. This is done by finding
different hyperplanes which classify the labels in the best way then it will
choose the one which is farthest from the data points or the one which
Advantages:
Multi-Class Classification: Multi-class classification involves assigning items into one of three or more
classes or categories. Each item can belong to only one class, and the goal is to correctly assign items to
their respective classes.
Examples: Handwritten digit recognition (classifying digits 0-9). Species classification in biology
(classifying animals into different species). News categorization (assigning news articles to categories like
politics, sports, entertainment).
Algorithms:
Logistic Regression (with one-vs-rest or softmax), Support Vector Machines, Decision Trees, Random
Forests, and Neural Networks are commonly used for multi-class classification.
Evaluation Metrics:
Accuracy, Precision, Recall, F1 Score, and confusion matrices can still be used in multi-class classification,
but they may be extended or adapted to handle multiple classes.
One-vs-Rest (OvR) and One-vs-One (OvO) Strategies:
• OvR: Trains a separate classifier for each class, treating it as the positive class and the rest as the
negative class. The final prediction is the class with the highest confidence.
• OvO: Constructs a binary classifier for each pair of classes. The final prediction is the class that
wins the most pairwise comparisons.
Considerations:
• The choice between binary and multi-class classification depends on the nature of the problem and the
number of classes involved.
• Algorithms designed for binary classification can be extended to handle multi-class problems using
various strategies, as mentioned above.
The most popular algorithms used Popular algorithms that can be used for
by the binary classification are- multi-class classification include:
• k-Nearest Neighbors
Algorithms • Logistic Regression • Decision Trees
used • k-Nearest Neighbors • Naive Bayes
• Decision Trees • Random Forest.
• Support Vector Machine • Gradient Boosting
• Naive Bayes
K-Means Clustering Algorithm: K-Means Clustering is an unsupervised learning algorithm that is used
to solve the clustering problems in machine learning or data science which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.It is an
iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.
• It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters. The value of k should be predetermined in
this algorithm.
The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
• Run K-means for Different Values of K: Start by running the K-means clustering algorithm for a range of
values of K. You can choose a reasonable range based on your understanding of the data.
• Compute the Sum of Squared Distances (Inertia): For each value of K, compute the sum of squared
distances (inertia or within-cluster sum of squares). This measures how far each point in a cluster is
from the center of that cluster.
• Plot the Elbow Curve: Plot a curve with the number of clusters (K) on the x-axis and the corresponding
sum of squared distances on the y-axis.
• Identify the "Elbow" Point: Examine the plot. The "elbow" is the point where the reduction in the sum
of squared distances starts to slow down, forming an elbow-like bend in the curve.
• Choose the Elbow Point as the Optimal K: The number of clusters (K) corresponding to the elbow point
is considered the optimal choice. At this point, adding more clusters doesn't significantly improve the
clustering quality.
What is Dimensionality Reduction?
• The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
• A dataset contains a huge number of input features in various cases, which makes the predictive
modeling task more complicated. Because it is very difficult to visualize or make predictions for the
training dataset with a high number of features, for such cases, dimensionality reduction techniques are
required to use
• Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions
dataset into lesser dimensions dataset ensuring that it provides similar information." These techniques
are widely used in machine learning for obtaining a better fit predictive model while solving the
classification and regression problems.
• It is commonly used in the fields that deal with high-dimensional data, such as speech recognition,
signal processing, bioinformatics, etc. It can also be used for data visualization, noise reduction, cluster
analysis, etc.
Avoid Overfitting: With fewer dimensions, the risk of overfitting diminishes. The model becomes more
focused on patterns rather than noise.
Visualize Data: It's challenging to visualize data in high dimensions. Dimensionality reduction helps project
data onto lower-dimensional spaces, making visualization feasible.
• PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
• PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA are
image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
Variance and Covariance Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:
• Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is
the number of columns present in the dataset.
• Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly
proportional to each other.
• Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
• Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector
if Av is the scalar multiple of v.
• Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.
Steps for PCA Algorithm:
• Getting the Dataset: Take the input dataset and split it into two parts: X (training set) and Y (validation
set).
• Representing Data Structure: Represent the dataset in a matrix structure (X). Each row corresponds to
data items, and each column corresponds to features. The number of columns is the dimensions of the
dataset.
• Standardizing the Data: Standardize the dataset to give importance to features with high variance. If
feature importance is independent of variance, divide each data item in a column by the standard
deviation of the column, creating a matrix named Z.
• Calculating the Covariance of Z: Calculate the covariance matrix of Z by transposing Z and multiplying it
by Z.
• Calculating Eigenvalues and Eigenvectors: Calculate the eigenvalues and eigenvectors for the
covariance matrix Z. Eigenvectors represent directions of axes with high information, and eigenvalues
are their coefficients.
• Sorting Eigen Vectors: Sort eigenvalues in decreasing order (largest to smallest) and simultaneously sort
the corresponding eigenvectors in matrix P.
• Calculating New Features (Principal Components): Multiply the sorted eigenvector matrix P* by Z to
obtain the new feature matrix Z*. Each observation in Z* is a linear combination of original features,
and columns are independent.
• Remove Unimportant Features: Decide which features to keep and remove in the new dataset Z*. Keep
relevant and important features, removing less important ones.
• Applications of Principal Component Analysis: PCA is mainly used as the dimensionality reduction
technique in various AI applications such as computer vision, image compression, etc.
• It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is
used are Finance, data mining, Psychology, etc.
• Linear Limitation of PCA: Traditional Principal Component Analysis (PCA) is powerful for linearly
separable datasets. However, it may not be optimal for non-linear datasets, as it assumes linear
relationships between features.
• Introduction to Kernel PCA: Kernel PCA is a technique designed to handle non-linear datasets. It
extends the capabilities of PCA by incorporating a kernel function, allowing the algorithm to project the
data into a higher-dimensional space where it becomes linearly separable.
• Kernel Functions: Kernel functions (e.g., linear, polynomial, Gaussian) play a key role in KPCA. They map
the input data into a space where complex relationships between data points can be linearly
represented.
• Workflow of KPCA:
➢ Nonlinear Mapping: Apply a chosen kernel function to map the original data into a higher-dimensional
feature space.
➢ PCA in the New Space: Conduct PCA in the transformed space. Now, even though the original data may
not be linearly separable, the transformed data might be.
➢ Capture Nonlinear Relationships: Extract principal components in the higher-dimensional space,
capturing complex and nonlinear relationships between data points.
➢ Applications: Use the principal components for various tasks, such as data visualization, clustering, or
classification, in the transformed space.
➢ Benefits of KPCA:
• Handling Non-Linearity: Effective in scenarios where relationships between features are non-linear.
• Complex Pattern Recognition: Captures intricate patterns and structures in the data that linear
methods might miss.
• Versatility: Can be applied to various types of datasets, especially those with non-linear characteristics.
Matrix Factorization
Matrix factorization is a technique used in various fields, including machine learning and data analysis. At
its core, it involves breaking down a matrix into a product of simpler matrices. This process is valuable for
tasks such as recommendation systems, collaborative filtering, and dimensionality reduction.
➢ Basic Idea: Consider a matrix, say A, with dimensions m×n. The goal of matrix factorization is to
express this matrix as the product of two matrices, B and C, where B has dimensions m×k and C has
dimensions k×n. The parameter k is a user-defined value representing the desired reduced
dimensionality.
➢ Mathematical Representation: A≈B×C
Use Cases:
➢ Recommendation Systems: In collaborative filtering, matrix factorization can represent users and
items in a lower-dimensional space, helping make personalized recommendations.
➢ Image Compression: For images represented as matrices, factorizing them can lead to a more
compact representation, saving storage space.
➢ Text Mining: Applied to document-term matrices in natural language processing to discover latent
topics in a collection of documents.
Matrix Factorization Algorithms:
➢ Singular Value Decomposition (SVD): One of the most common methods, SVD decomposes a matrix
into three matrices representing singular values and left and right singular vectors.
➢ Alternating Least Squares (ALS): Often used in collaborative filtering problems, ALS iteratively updates
user and item matrices to minimize the reconstruction error.
➢ Gradient Descent: Optimization algorithms like gradient descent can be applied to minimize the
difference between the original matrix and the product of the factorized matrices.
Benefits:
Matrix Completion
Matrix completion is a technique used in data analysis to fill in or "complete" missing values in a matrix.
Imagine you have a matrix with some entries missing, and you want to predict or estimate those missing
values based on the available information. Matrix completion algorithms are designed to achieve precisely
that.
Key Concepts:
• Incomplete Matrix: Imagine you have a matrix where some entries are unknown or missing. This matrix
could represent various types of data, such as user-item ratings, sensor measurements, or any situation
where not all information is available.
• Objective: The goal of matrix completion is to fill in the missing entries in the matrix accurately. This is
achieved by leveraging the patterns and relationships present in the observed (non-missing) entries.
• Assumption: Matrix completion assumes that the underlying data has some inherent structure or low-
rank property. Low-rank matrices have a reduced number of independent columns or rows, suggesting
that the data can be well-approximated using a smaller number of features.
Mixture Models
Mixture models assume that the data is generated by a mixture of several underlying probability
distributions. Each component in the mixture represents a different source or process that contributes to
the overall distribution.
Components: Each component has its own parameters (mean, variance) and a weight that determines its
contribution to the overall mixture.
Example: Think of a mixture of Gaussian distributions. Each Gaussian component represents a cluster in the
data, and the mixture model can capture complex patterns where data points may come from different
clusters.
Use Cases: Mixture models are used in clustering problems, where the goal is to assign data points to
different clusters based on their probability of belonging to each component.
Components: Latent factors are hidden variables that are not directly observed but influence the observed
data.
Example: In collaborative filtering for recommendation systems, users and items can be represented by
latent factors. The model aims to discover these latent factors to predict how users might rate items.
Use Cases: Latent factor models are commonly used in recommendation systems, matrix factorization, and
other applications where understanding the underlying factors influencing the data is essential.
➢ Key Takeaways:
Generative Models:
• Data Volume: Scalable machine learning systems can handle large and growing datasets. As the
amount of data increases, the system remains effective in training models and making predictions.
• Model Complexity: Scalability also applies to the complexity of the machine learning models. A
scalable system should be able to accommodate more intricate models without a significant drop in
performance.
• Computational Resources: Scalable machine learning algorithms efficiently use available
computational resources. This includes the ability to parallelize computations, distribute tasks
across multiple processors or machines, and make use of specialized hardware when available.
• Performance Consistency: Scalability doesn't just mean handling larger volumes of data; it also
involves maintaining consistent performance. As the system scales, it should still provide reliable
and timely results without becoming prohibitively slow or resource-intensive.
Why Scalability Matters:
• Big Data Challenges: In the era of big data, organizations deal with massive datasets. Scalable
machine learning is crucial for extracting meaningful insights from these vast amounts of
information.
• Complex Models: As machine learning models become more sophisticated, they often require more
computational resources. Scalability ensures that these models can be trained and deployed
efficiently.
• Real-Time Processing: Some applications, such as real-time analytics or online services, require
quick responses. Scalable machine learning allows for timely predictions even when faced with large
workloads.
• Cost Efficiency: Efficient use of resources is vital for cost-effectiveness. Scalable solutions can
leverage resources more effectively, reducing the total cost of computation.
• Distributed Computing Frameworks: Technologies like Apache Spark and Hadoop enable the
distribution of machine learning tasks across a cluster of machines, enhancing scalability.
• Parallel Processing: Algorithms that can be parallelized, such as stochastic gradient descent, allow
for efficient use of multiple processors, speeding up training on large datasets.
• Cloud Computing: Cloud platforms provide scalable infrastructure for machine learning, allowing
users to scale up or down based on their computational needs.
Semi-Supervised Learning
Semi-supervised learning is a type of machine learning where the model is trained on a dataset that
contains both labeled and unlabeled data. In traditional supervised learning, the model is trained solely on
labeled data, where each input is associated with a corresponding output. However, obtaining labeled data
can be expensive and time-consuming. Semi-supervised learning seeks to leverage both labeled and
unlabeled data to build a more robust and accurate model.
How Semi-Supervised Learning Works:
• Initial Labeled Training: The model is first trained on the available labeled data in a supervised
manner. This helps the model learn from the explicit input-output associations.
• Unlabeled Data Utilization: After the initial training, the model is fine-tuned or further trained on
the unlabeled data. The model generalizes from the patterns observed in the labeled data to make
predictions on the unlabeled instances.
• Semi-Supervised Algorithms: Various algorithms and techniques are designed for semi-supervised
learning, including self-training, co-training, and multi-view learning. These approaches leverage the
unlabeled data in different ways to enhance model performance.
Challenges:
• Quality of Unlabeled Data: The effectiveness of semi-supervised learning depends on the quality
and representativeness of the unlabeled data.
• Model Sensitivity: The model's performance may be sensitive to the proportion of labeled and
unlabeled data, and the choice of algorithm.
Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent learns how to behave in an
environment by performing actions and receiving feedback in the form of rewards or penalties. The agent
aims to discover the optimal strategy or policy that maximizes cumulative rewards over time. In simpler
terms, it's like training a computer to make decisions by trial and error, figuring out what actions lead to
better outcomes.
Key Concepts:
• Agent: The learner or decision-maker is referred to as the agent. It interacts with the
environment and makes decisions to achieve specific goals.
• Environment: The external system or context in which the agent operates is called the
environment. It can be anything from a game environment to a robotic system or a
simulated world.
• Actions: The set of possible moves or decisions that the agent can make within the
environment. Actions are taken based on the current state.
• States: Representations of the current situation or configuration of the environment. The
agent's actions influence the transition from one state to another.
• Rewards: Numeric values that the agent receives as feedback after taking an action in a
specific state. Rewards indicate the immediate benefit or cost associated with the action.
• Policy: The strategy or set of rules that the agent follows to decide its actions. The goal is to
find the optimal policy that maximizes the cumulative reward.
How Reinforcement Learning Works:
• Initialization: The agent starts in an initial state within the environment.
• Action Selection: The agent selects an action based on its current state, following a certain
policy.
• Environment Interaction: The selected action influences a transition to a new state within
the environment.
• Reward Assignment: The agent receives a reward or penalty based on the action taken and
the new state reached.
• Learning: The agent adjusts its strategy or policy based on the received feedback, aiming to
improve its decision-making over time.
• Iterative Process: The agent continues to interact with the environment, refining its policy
through repeated trial and error.
Applications of Reinforcement Learning:
• Game Playing: Training agents to play games like chess, Go, or video games.
• Robotics: Teaching robots to perform complex tasks in real-world environments.
• Autonomous Vehicles: Enabling self-driving cars to make decisions in dynamic traffic
situations.
• Recommendation Systems: Optimizing recommendations to users based on their
interactions.
Inference in graphical models, particularly in the context of probabilistic graphical models (PGMs), involves
making predictions or estimating unknown variables given observed data and the structure of the
model. Graphical models, such as Bayesian networks or Markov random fields, use graphical
representations to encode probabilistic relationships among a set of variables.
Here's a brief explanation of inference in graphical models:
• Probabilistic Graphical Models (PGMs): PGMs are a family of statistical models that represent
the dependencies between random variables using a graph structure. Nodes in the graph
represent variables, and edges represent probabilistic dependencies.
• Types of Graphical Models:
o Bayesian Networks (BN): Directed acyclic graphs representing probabilistic dependencies
among variables.
o Markov Random Fields (MRF): Undirected graphs representing dependencies using
pairwise potentials.
• Inference Tasks:
o Marginalization: Computing the marginal distribution of one or more variables by
summing or integrating over other variables.
o Conditioning: Updating the probability distribution based on observed evidence or
conditions.
o Maximum A Posteriori (MAP) Estimation: Finding the most probable values of variables
given evidence.
o Joint Probability Estimation: Computing the joint probability of a set of variables.
• Message Passing: Many inference algorithms in graphical models involve message passing
between nodes in the graph. Algorithms like the Belief Propagation algorithm or the Junction
Tree algorithm use message passing to efficiently compute probabilities.
• Applications:
o Medical Diagnosis: Bayesian networks can be used to model the dependencies among
symptoms and diseases for diagnostic purposes.
o Image Segmentation: Markov random fields can represent spatial dependencies among
pixels for image segmentation tasks.
• Challenges: Computational Complexity: Inference in graphical models can be computationally
expensive, especially for large and complex graphs.
o Graph Structure: The accuracy of inference often depends on the correct modeling of the
underlying dependencies in the graph.
For Download More Computer
Science Notes Join Our
Telegram Channel -
https://t.me/thecoderbro
Scan to Join -