0% found this document useful (0 votes)

24 views20 pages

Module-3 DSV

Uploaded by

Swathi Y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views20 pages

Module-3 DSV

Uploaded by

Swathi Y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

21CS644- INTRODUCTION TO DATA SCIENCE AND VISUALIZATION

Module-3 Syllabus:
Feature Generation and Feature Selection
Extracting Meaning from Data: Motivating application: user (customer) retention. Feature
Generation (brainstorming, role of domain expertise, and place for imagination), Feature Selection
algorithms. Filters; Wrappers; Decision Trees; Random Forests. Recommendation Systems: Building
a User-Facing Data Product, Algorithmic ingredients of a Recommendation Engine, Dimensionality
Reduction, Singular Value Decomposition, Principal Component Analysis, Exercise: build your own
recommendation system.

1.Define Feature generation. How information can be categorized in feature generation in

detail.

Feature Generation or Feature Extraction

Feature generation, also known as feature extraction, is the process of transforming raw data into a
structured format where each column represents a specific characteristic or attribute (feature) of the
data, and each row represents an observation or instance.

• This involves identifying, creating, and selecting meaningful variables from the raw data that can
be used in machine learning models to make predictions or understand patterns.
• This process is both an art and a science. Having a domain expert involved is beneficial, but
using creativity and imagination is equally important.
• Remember, feature generation is constrained by two factors: the feasibility of capturing certain
information and the awareness to consider capturing it.

Information can be categorized into the following buckets:

• Relevant and useful, but it’s impossible to capture it.
Keep in mind that much user information isn't captured, like free time, other apps, employment
status, or insomnia, which might predict their return. Some captured data may act as proxies for
these factors, such as playing the game at 3 a.m. indicating insomnia or night shifts.
• Relevant and useful, possible to log it, and you did.
The decision to log this information during the brainstorming session was crucial. However, mere
logging doesn't guarantee understanding its relevance or usefulness. The feature selection process
aims to uncover this information.
• Relevant and useful, possible to log it, but you didn’t.
Human limitations can lead to overlooking crucial information, emphasizing the need for creative
feature selection. Usability studies help identify key user actions for better feature capture.
• Not relevant or useful, but you don’t know that and log it.
Feature selection aims to address this: while you've logged certain information, unknowing its
necessity.
• Not relevant or useful, and you either can’t capture it or it didn’t occur to you.
Feature Generation or Feature Extraction.

2.Explain the importance of feature selection along with the types of feature selection methods.

Feature Selection
Feature Selection refers to the process of selecting the most relevant features (or variables) from
the dataset to use in building a predictive model. The goal is to improve the performance of the
model by eliminating irrelevant or redundant features, which can lead to better accuracy, reduced
overfitting, and more efficient computation.
Importance of Feature Selection: Feature selection is crucial because it helps in simplifying
models, making them easier to interpret, reducing computational cost, and often improving the
generalization of the model by reducing overfitting.

Why Feature Selection is important?

 It simplifies the model: data reduction, less storage, Occam’s razor

and better visualization
 Reduces training time
 Avoids over-fitting
 Improves accuracy of the model
 Avoids curse of dimensionality.

Types of Feature Selection Methods:

Feature selection methods can be grouped into three categories: filter method, wrapper method
and embedded method.

Filter Methods:

Filters prioritize features based on specific metrics or statistics, such as correlation with the
outcome variable, offering a quick overview of predictive power off. These involve statistical
techniques to evaluate the relevance of each feature individually based on its relationship with
the target variable. Examples include correlation coefficients, Chi-square tests, and mutual
information.

A subset of features is selected based on their relationship to the target variable. The selection
is not dependent of any machine learning algorithm. On the contrary, filter methods measure
the “relevance” of the features with the output via statistical tests. You can use the following
table for reference:
Wrapper methods
In wrapper methods, the feature selection process is based on a specific machine
learning algorithm that we are trying to fit on a given dataset.
It follows a greedy search approach by evaluating all the possible combinations of
features against the evaluation criterion. The evaluation criterion is simply the
performance measure which depends on the type of problem.

In wrapper feature selection, two key aspects require consideration:

✓ first, the choice of an algorithm for feature selection, and

✓ second, the determination of a selection criterion or filter to find out the usefulness of
the chosen feature set.
A. Selecting an Algorithm/ Most commonly used techniques under wrapper methods
are:

Forward selection:
Forward selection involves systematically adding features to a regression model one at a time based
on their ability to improve model performance according to a selection criterion. This iterative
process continues until further feature additions no longer enhance the model performance.

Backward elimination:
Begins with a regression model containing all features. Subsequently one feature is systematically
removed at a time, the feature whose removal makes the biggest improvement in the selection
criterion. Stop removing features when removing the feature makes the selection criterion get worse.

Bi-directional elimination (Stepwise Selection):

The combined approach in feature selection blends forward selection and backward elimination to
strike a balance between maximizing relevance and minimizing redundancy. It iteratively adds and
removes features based on their significance and impact on model fit, resulting in a subset of features
optimized for predictive power.

B. Selection Criterion
The choice of selection criteria in feature selection methods may seem arbitrary. To address this,
experimenting with various criteria can help assess model robustness. Different criteria may yield
diverse models, necessitating the prioritization of optimization goals based on the problem context
and objectives.

R-squared
R-squared can be interpreted as the proportion of variance explained by your model.

p-values
In regression analysis, the interpretation of p-values involves assuming a null hypothesis where the
coefficients (βs) are zero. A low p-value suggests that observing the data and obtaining the
estimated coefficient under the null hypothesis is highly unlikely, indicating a high likelihood that
the coefficient is non-zero.
AIC (Akaike Information Criterion)
Given by the formula 2k−2ln(L), where k is the number of parameters in the model and
ln(L) is the “maximized value of the log likelihood.” The goal is to minimize AIC.
BIC (Bayesian Information Criterion)

Given by the formula k*ln(n) −2ln(L), where k is the number of parameters in the model, n
is the number of observations (data points, or users), and ln(L) is the maximized value of
the log likelihood. The goal is to minimize BIC.
Entropy

Entropy is a measure of disorder or impurity in the given dataset.

Embedded Methods: These involve algorithms that perform feature selection during the model
training process. Regularization methods like LASSO (Least Absolute Shrinkage and Selection
Operator) and Ridge Regression are examples.

3. Explain and construct decision tree with an example.

Decision Trees

A decision tree is a non-parametric supervised learning algorithm for classification and regression
tasks. It has a hierarchical tree structure consisting of a root node, branches, internal nodes, and leaf
nodes. Decision trees are used for classification and regression tasks, providing easy-to-understand
models.

A decision tree is a hierarchical model used in decision support that depicts decisions and their potential
outcomes, incorporating chance events, resource expenses, and utility. This algorithmic model utilizes
conditional control statements and is non- parametric, supervised learning, useful for both classification
and regression tasks. The tree structure is comprised of a root node, branches, internal nodes, and leaf
nodes, forming a hierarchical, tree-like structure.

It is a tool that has applications spanning several different areas. Decision trees can be used for
classification as well as regression problems. The name itself suggests that it uses a flowchart like a tree
structure to show the predictions that result from a series of feature-based splits. It starts with a root
node and ends with a decision made by leaves.
Example of Decision Tree
https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/

https://medium.com/geekculture/step-by-step-decision-tree-id3-algorithm-from-
scratch-in-python-no-fancy-library-4822bbfdd88f

ID3 Algorithm Decision Tree – Solved Example – Machine Learning

Problem Definition:
Build a decision tree using ID3 algorithm for the given training data in the table (Buy
Computer data), and predict the class of the following new example: age<=30,
income=medium, student=yes, credit-rating=fair

Solution:

First, check which attribute provides the highest Information Gain in order to split the
training set based on that attribute. We need to calculate the expected information to
classify the set and the entropy of each attribute.

The information gain is this mutual information minus the entropy: The mutual information of the two classes,
Entropy(S)= E(9,5)= -9/14 log2(9/14) – 5/14 log2(5/14)=0.94

Now Consider the Age attribute

For Age, we have three values age<=30 (2 yes and 3 no), age31..40 (4 yes and 0 no),
and age>40 (3 yes and 2 no)

Entropy(age) = 5/14 (-2/5 log2(2/5)-3/5log2(3/5)) + 4/14 (0) + 5/14 (-3/5log2(3/5)-

2/5log2(2/5))

= 5/14(0.9709) + 0 + 5/14(0.9709) = 0.6935

Gain(age) = 0.94 – 0.6935 = 0.2465

Next, consider Income Attribute

For Income, we have three values incomehigh (2 yes and 2 no), incomemedium (4 yes
and 2 no), and incomelow (3 yes 1 no)

Entropy(income) = 4/14(-2/4log2(2/4)-2/4log2(2/4)) + 6/14 (-4/6log2(4/6)-

2/6log2(2/6)) + 4/14 (-3/4log2(3/4)-1/4log2(1/4))

= 4/14 (1) + 6/14 (0.918) + 4/14 (0.811)

= 0.285714 + 0.393428 + 0.231714 = 0.9108

Gain(income) = 0.94 – 0.9108 = 0.0292

Next, consider Student Attribute For Student, we have two values studentyes (6 yes
and 1 no) and studentno (3 yes 4 no)

entropy(student) = 7/14(-6/7log2(6/7)-1/7log2(1/7)) + 7/14(-3/7log2(3/7)-4/7log2(4/7)

= 7/14(0.5916) + 7/14(0.9852)

= 0.2958 + 0.4926 = 0.7884

Gain (student) = 0.94 – 0.7884 = 0.1516

Finally, consider Credit_Rating Attribute

For Credit_Rating we have two values credit_ratingfair (6 yes and 2 no) and credit_ratingexcellent (3 yes 3 no)

Entropy(credit_rating) = 8/14(-6/8log2(6/8)-2/8log2(2/8)) + 6/14(-3/6log2(3/6)-

3/6log2(3/6))

= 8/14(0.8112) + 6/14(1)

= 0.4635 + 0.4285 = 0.8920

Gain(credit_rating) = 0.94 – 0.8920 = 0.479

Since Age has the highest Information Gain we start splitting the dataset using
the age attribute.
Decision Tree after step 1
Since all records under the branch age31..40 are all of the class, Yes, we can replace
the leaf with Class=Yes

Decision Tree after step 1_1

Now build the decision tree for the left subtree

The same process of splitting has to happen for the two remaining branches.

Left sub-branch
For branch age<=30 we still have attributes income, student, and credit_rating. Which
one should be used to split the partition?

The mutual information is E(Sage<=30)= E(2,3)= -2/5 log2(2/5) – 3/5 log2(3/5)=0.97

For Income, we have three values income high (0 yes and 2 no), income medium
(1 yes and 1 no) and income low (1 yes and 0 no)

Entropy(income) = 2/5(0) + 2/5 (-1/2log2(1/2)-1/2log2(1/2)) + 1/5 (0) = 2/5 (1) = 0.4

Gain(income) = 0.97 – 0.4 = 0.57

For Student, we have two values student yes (2 yes and 0 no) and student no (0 yes
3 no)
Entropy(student) = 2/5(0) + 3/5(0) = 0

Gain (student) = 0.97 – 0 = 0.97

We can then safely split on attribute student without checking the other attributes
since the information gain is maximized.

Decision Tree after step 2

Since these two new branches are from distinct classes, we make them into leaf nodes
with their respective class as label:

Decision Tree after step 2_2

Now build the decision tree for right left subtree

Right sub-branch
The mutual information is Entropy(Sage>40)= I(3,2)= -3/5 log2(3/5) – 2/5
log2(2/5)=0.97

For Income, we have two values income medium (2 yes and 1 no) and income low
(1 yes and 1 no)
Entropy(income) = 3/5(-2/3log2(2/3)-1/3log2(1/3)) + 2/5 (-1/2log2(1/2)-1/2log2(1/2))

= 3/5(0.9182)+2/5 (1) = 0.55+0. 4= 0.95

Gain(income) = 0.97 – 0.95 = 0.02

For Student, we have two values student yes (2 yes and 1 no) and student no (1 yes and 1 no)

Entropy(student) = 3/5(-2/3log2(2/3)-1/3log2(1/3)) + 2/5(-1/2log2(1/2)-1/2log2(1/2))

= 0.95

Gain (student) = 0.97 – 0.95 = 0.02

For Credit_Rating, we have two values credit_ratingfair (3 yes and 0 no) and
credit_ratingexcellent (0 yes and 2 no)

Entropy(credit_rating) = 0

Gain(credit_rating) = 0.97 – 0 = 0.97

We then split based on credit_rating. These splits give partitions each with records
from the same class. We just need to make these into leaf nodes with their class label
attached:

Decision Tree for Buys Computer

New example: age<=30, income=medium, student=yes, credit-rating=fair

Follow branch(age<=30) then student=yes we predict Class=yes

Buys_computer = yes

4. Explain the Random Forest algorithm with an example (or) What are the drawbacks of
decision trees and mention the key features of random forest along with the how the
working.

Random Forest

Random forest is a supervised learning algorithm. The “forest” it builds is an

ensemble of decision trees, usually trained with the bagging method.
 Random forests generalize decision trees with bagging, otherwise known has Bootstrap
Aggregating.
 Makes the models more accurate and more robust, but at the cost of interpretability
 But easy to specify – 2 Hyperparameters: Number of Trees (N) in the forest and Number
of Features (F) to randomly select for each tree
 A bootstrap sample is a sample with replacement, which means we might sample the same
data point more than once. We usually take to the sample size to be 80% of the size of the
entire (training) dataset, but of course this parameter can be adjusted de‐pending on
circumstances. This is technically a third hyper parameter of our random forest algorithm.
 To construct a random forest, you construct N decision trees as follows:
 For each tree, take a bootstrap sample of your data, and for each node you randomly
select F features, say 5 out of the 100 total features.
 Then you use your entropy-information-gain engine as described in the previous section
to decide which among those features you will split your tree on at each stage.

Algorithm for Random Forest Work:

Step 1: Select random K data points from the training set.

Step 2: Build the decision trees associated with the selected data points(Subsets).
Step 3: Choose the number N for decision trees that you want to build.
Step 4: Repeat Step 1 and 2.
Step 5: For new data points, find the predictions of each decision tree, and assign the
new data points to the category that wins the majority votes.
Key Features of Random Forest
Some of the Key Features of Random Forest are discussed below:

High Predictive Accuracy: Imagine Random Forest as a team of decision-making wizards. Each
wizard (decision tree) looks at a part of the problem, and together, they weave their insights into a
powerful prediction tapestry. This teamwork often results in a more accurate model than what a
single wizard could achieve.

Resistance to Overfitting: This approach helps prevent getting too caught up with the training data
which makes the model less prone to overfitting.

Large Datasets Handling: Each helper takes on a part of the dataset, ensuring that the
expedition is not only thorough but also surprisingly quick.

Variable Importance Assessment: It assesses the importance of each clue in solving the case,
helping you focus on the key elements that drive predictions.

Built-in Cross-Validation: This built-in validation ensures your model doesn’t just ace the training
but also performs well on new challenges.

https://www.geeksforgeeks.org/random-forest-algorithm-in-machine-learning/

4. Apply random forest algorithm for the iris dataset. Compute accuracy using decision tree and
random forest and justify the results.

Refer the colab notebook for the implementation

Decision Tree and Random Forest.ipynb - Colab (google.com)
5. What are recommendation systems? Explain various filtering algorithms used in
recommendation systems with examples. (or)

A recommendation engine filters the data using different algorithms and recommends
the most relevant items to users. It first captures the past behavior of a customer and
based on that, recommends products which the users might be likely to buy.

If a completely new user visits an e-commerce site, that site will not have any past
history of that user. So how does the site go about recommending products to the user
in such a scenario? One possible solution could be to recommend the best selling
products, i.e. the products which are high in demand. Another possible solution could
be to recommend the products which would bring the maximum profit to the business.

If we can recommend a few items to a customer based on their needs and interests, it
will create a positive impact on the user experience and lead to frequent visits. Hence,
businesses nowadays are building smart and intelligent recommendation engines by
studying the past behavior of their users.

How does a Recommendation Engine Work?

Step1: Data Collection

This is the first and most crucial step for building a recommendation engine. The data
can be collected by two means: explicitly and implicitly. Explicit data is information
that is provided intentionally, i.e. input from the users such as movie ratings. Implicit
data is information that is not provided intentionally but gathered from available data
streams like search history, clicks, order history, etc.

Filtering Algorithms
1. Content based filtering
This algorithm recommends products which are similar to the ones that a user has liked
in the past.

Consider Example of Netflix

Recommendation engines save all information related to each user in a vector form
known as the profile vector, which contains the user’s past behavior, including liked
or disliked movies and given ratings. Information about movies is stored in another
vector called the item vector, which includes details such as genre, cast, and director.
The content-based filtering algorithm uses cosine similarity to find the cosine of the
angle between the profile vector and the item vector. If A is the profile vector and B is
the item vector, the similarity between them can be calculated as the cosine of the
angle between these two vectors.

Based on the cosine value, which ranges between -1 to 1, the movies are arranged in
descending order and one of the two below approaches is used for recommendations:

Top-n approach: where the top n movies are recommended (Here n can be decided by
the business)

Rating scale approach: Where a threshold is set and all the movies above that
threshold are recommended

Other methods that can be used to calculate the similarity are:

Euclidean Distance: Similar items will lie in close proximity to each other if plotted in
n-dimensional space. So, we can calculate the distance between items and based on
that distance, recommend items to the user. The formula for the euclidean distance is
given by:

Pearson’s Correlation: It tells us how much two items are correlated. Higher the
correlation, more will be the similarity. Pearson’s correlation can be calculated using
the following formula:

2. Collaborative filtering
Let us understand this with an example. If person A likes 3 movies, say Interstellar,
Inception and Predestination, and person B likes Inception, Predestination and The
Prestige, then they have almost similar interests. We can say with some certainty
that A should like The Prestige and B should like Interstellar. The collaborative
filtering algorithm uses “User Behavior” for recommending items.
User-User collaborative filtering
This algorithm first finds the similarity score between users. Based on this similarity
score, it then picks out the most similar users and recommends products which these
similar users have liked or bought previously.
This algorithm finds the similarity between each user based on the ratings they have
previously given to different movies. The prediction of an item for a user u is
calculated by computing the weighted sum of the user ratings given by other users
to an item i.
The prediction Pu,i is given by:

Here,
 Pu,i is the prediction of an item
 Rv,i is the rating given by a user v to a movie i
 Su,v is the similarity between users
Let us understand it with an example:

Consider the user-movie rating matrix:

User/Movie x1 x2 x3 x4 x5 Mean User Rating

A 4 1 – 4 – 3
B – 4 – 2 3 3
C – 1 – 4 4 3
Here we have a user movie rating matrix. To understand this in a more practical manner, let’s
find the similarity between users (A, C) and (B, C) in the above table. Common movies rated
by A/[ and C are movies x2 and x4 and by B and C are movies x2, x4 and x5.

The correlation between user A and C is more than the correlation between B and C.
Hence users A and C have more similarity and the movies liked by user A will be recommended
to user C and vice versa.
This algorithm is quite time consuming as it involves calculating the similarity for
each user and then calculating prediction for each similarity score. One way of
handling this problem is to select only a few users (neighbors) instead of all to
make predictions, i.e. instead of making predictions for all similarity values, we
choose only few similarity values.
Item-Item collaborative filtering

The algorithm aims to find similarity between movie pairs and recommend similar ones based on
user-user collaborative filtering. It uses the weighted sum of ratings of “item-neighbors” instead
of “user-neighbors” and provides predictions based on user-friendliness.

Let us understand it with an example.

User/Movie x1 x2 x3 x4 x5
A 4 1 2 4 4
B 2 4 4 2 1
C – 1 – 3 4
Mean Item 3 2 3 3 3
Rating

The mean item rating is the average of all ratings given to a particular item, compared
to the user-user filtering table. Instead of finding user-user similarity, item-item
similarity is calculated. For example, comparing movies (x1, x4) and (x1, x5), common
users who have rated these items are A and B, while those who have rated movies x1
and x5 are also A and B.

The similarity between movie x1 and x4 is more than the similarity between movie x1
and x5. So based on these similarity values, if any user searches for movie x1, they
will be recommended movie x4 and vice versa.

6. Describe the problems with Nearest Neighbor in recommendation systems.

kNN algorithm is a reliable and intuitive recommendation system that leverages user
or item similarity to provide personalized recommendations. kNN recommender
system is helpful in e-commerce, social media, and healthcare, and continues to be an
important tool for generating accurate and personalized recommendations.

The nearest neighbor algorithm is a popular approach in recommendation systems for

identifying items or users that are most similar to a given item or user.

Some Problem with nearest neighbor:

Data Quality Issues: Check the quality and consistency of your data. Ensure that your
data is clean, free from outliers, and properly preprocessed.

Distance Metric Selection: The choice of distance metric (e.g., Euclidean distance,
cosine similarity) can significantly impact the performance of the algorithm.
Experiment with different metrics to see which one best captures the similarity between
items or users in your dataset.

Curse of Dimensionality: In high-dimensional spaces, distance-based algorithms like

nearest neighbors can become less effective due to the increased sparsity of data
points. Consider dimensionality reduction techniques like PCA (Principal Component
Analysis) or feature selection to mitigate this issue.

Cold Start Problem: Nearest neighbor algorithms may struggle with cold start
problems, where there isn't enough data available for new users or items. Consider
using hybrid approaches or incorporating content-based features to handle this scenario.

Scalability: For large datasets, computing distances between all pairs of items or users
can be computationally expensive. Look into approximate nearest neighbor methods or
data structures like KD-trees or Ball trees to improve efficiency.

Normalization: Ensure that features used for calculating similarity are properly
normalized to prevent certain features from dominating the distance calculation.

Hyperparameter Tuning: If using algorithms like k-nearest neighbors (k-NN),

experiment with different values of k and other hyperparameters to find the optimal
configuration for your dataset.

User/item representation: Make sure that your representation of users and items
(feature vectors) appropriately captures the relevant characteristics that define similarity
in your recommendation context.

Evaluation Metrics: Use appropriate evaluation metrics (e.g., precision, recall, RMSE
for rating prediction) to assess the performance of your recommendation system and
identify areas for improvement.

Implementation Bugs: Double-check your implementation for bugs or logical errors

that could affect the correctness of your results.

By systematically addressing these potential issues, you should be able to diagnose and
improve the performance of your nearest neighbor algorithm in your recommendation
system.
7.What is dimensionality reduction? What are the different techniques along with the benefits
and applications of dimensionality reduction.

Dimensionality Reduction

Dimensionality reduction is a fundamental technique in data science aimed at reducing

the number of input variables (dimensions) under consideration. It's particularly useful
in scenarios where datasets have a large number of features or dimensions, which can
lead to increased computational complexity, overfitting, and reduced model
interpretability. Here are some key points about dimensionality reduction:

Purpose: The primary goal of dimensionality reduction is to simplify data

representation while retaining important information. This simplification can aid in
better understanding the underlying structure of data, improving computational
efficiency, and enhancing model performance.

Techniques:

Principal Component Analysis (PCA): PCA is one of the most widely used
dimensionality reduction techniques. It transforms the original variables into a new set
of orthogonal variables (principal components) that capture the maximum variance in
the data.

t-Distributed Stochastic Neighbor Embedding (t-SNE): t-SNE is effective for

visualizing high-dimensional data by mapping similar instances to nearby points in a
lower-dimensional space.

Linear Discriminant Analysis (LDA): LDA is often used in supervised learning tasks
to find the feature subspace that maximizes class separability.

Autoencoders: These are neural network models that learn efficient representations
of data by encoding input into a lower-dimensional latent space and then reconstructing
the output from this representation.

Benefits:
Improved Model Performance: By reducing noise and irrelevant features,
dimensionality reduction can lead to better generalization and predictive performance
of machine learning models.

Visualization: Lower-dimensional representations are easier to visualize, enabling

better exploration and understanding of data patterns.

Efficiency: Reduced dimensionality can lead to faster training times and less memory
usage, especially beneficial for large datasets.

Application in Recommendation Systems:

In recommendation systems, dimensionality reduction techniques can be applied to
user-item interaction matrices to uncover latent factors or preferences.
By reducing the dimensionality of feature vectors representing users and items,
recommendation algorithms can efficiently compute similarities or recommendations
while mitigating the effects of sparsity and noise in data.
8. Explain Singular value decomposition.

Singular Value Decomposition (SVD) is a powerful matrix factorization technique used

extensively in data science and recommendation systems. Here’s an overview of SVD and its
relevance:

Key Concepts and Uses of SVD:

Dimensionality Reduction:

SVD is used for reducing the dimensionality of data. By retaining only the most
significant singular values and corresponding vectors, you can represent the original
matrix with reduced dimensions.

Matrix Approximation:

o SVD allows for approximating a matrix AAA by using only the first kkk singular
values and vectors. This approximation can be useful for compressing data or
denoising.

Collaborative Filtering in Recommendation Systems:

o In recommendation systems, SVD is used for collaborative filtering. It helps in

uncovering latent factors that represent user preferences and item characteristics.

Principal Component Analysis (PCA):

o PCA can be seen as a specific application of SVD, where the covariance matrix of a
dataset is decomposed to find its principal components.

Steps for Using SVD in Recommendation Systems:

Advantages of SVD in Recommendation Systems:

Implicit Feedback Handling: SVD can handle implicit feedback data (e.g., user views
or clicks) effectively by capturing underlying patterns in user-item interactions.

Personalization: By learning latent factors, SVD can provide personalized

recommendations based on user preferences.

Scalability: Techniques like incremental SVD and stochastic gradient descent can be
used to scale SVD to large datasets.

9. Explain Principal Component Analysis

PCA is a statistical method that transforms a set of correlated variables (or features)
into a set of linearly uncorrelated variables called principal components. These
principal components are ordered by the amount of variance they explain in the original
data.

Principal Component Analysis (PCA) is a widely used technique in data science for
reducing the dimensionality of data while retaining as much variance as possible.

Key Concepts and Steps in PCA:

Covariance Matrix:
PCA starts by computing the covariance matrix of the dataset, which captures the
pairwise relationships between different variables.

Eigen decomposition or Singular Value Decomposition (SVD):

The covariance matrix is then decomposed into its eigenvectors and eigenvalues, which
represent the directions and magnitudes of maximum variance in the data.
Selecting Principal Components:
Principal components are selected based on the eigenvalues, with higher eigenvalues
indicating greater variance explained. Typically, the number of principal components
chosen is based on the desired level of variance retention.

Transforming the Data:

Finally, the original dataset is transformed into the new space defined by the selected
principal components. This transformation projects the data onto a lower-dimensional
subspace while preserving as much variance as possible.

Applications of PCA:

Dimensionality Reduction:
PCA is primarily used for reducing the number of variables in a dataset while retaining
most of the information. This is beneficial for improving computational efficiency,
reducing noise, and avoiding overfitting in machine learning models.

Visualization:
PCA is valuable for visualizing high-dimensional data. By reducing data to two or
three principal components, it becomes easier to plot and understand the underlying
structure and relationships.

Feature Extraction:
PCA can be used as a feature extraction technique where the principal components
serve as new features that may be more informative or less redundant than the original
variables.

Noise Reduction:
PCA can effectively filter out noise by emphasizing variations in data that are
significant (captured by principal components with high eigenvalues) and disregarding
variations that are less significant (captured by components with low eigenvalues).

Advantages of PCA:
Interpretability: Principal components are linear combinations of original variables,
making them interpretable in terms of the contributions of different features.

Data Compression: PCA allows for data compression by reducing the number of
dimensions while preserving most of the variance, which is useful for storage and
computation.

Improves Model Performance: By reducing the number of input variables, PCA can
lead to simpler and more efficient models that generalize better to new data.

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
JNK Rao 2008 SOME METHODS FOR SMALL AREA ESTIMATION
No ratings yet
JNK Rao 2008 SOME METHODS FOR SMALL AREA ESTIMATION
21 pages
Time Series ARIMA Models PDF
No ratings yet
Time Series ARIMA Models PDF
22 pages
Module 3
No ratings yet
Module 3
33 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Feature Selection
No ratings yet
Feature Selection
22 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Feature Selection
No ratings yet
Feature Selection
36 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
No ratings yet
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
63 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
ML_Module_VI
No ratings yet
ML_Module_VI
24 pages
Data-Science Feature Selection & Extraction
No ratings yet
Data-Science Feature Selection & Extraction
15 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
1 of 1
No ratings yet
1 of 1
41 pages
DM Prathameshwadnerkar92
No ratings yet
DM Prathameshwadnerkar92
9 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
Lua Chon Dac Trung
No ratings yet
Lua Chon Dac Trung
18 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Module 3 DS
No ratings yet
Module 3 DS
44 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Feature Selection
No ratings yet
Feature Selection
61 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
No ratings yet
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
13 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Module 3
No ratings yet
Module 3
29 pages
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
No ratings yet
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
12 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Feature Selection Techniques and Its Importance in Machine Learning: A Survey
No ratings yet
Feature Selection Techniques and Its Importance in Machine Learning: A Survey
6 pages
Feature Selection
No ratings yet
Feature Selection
17 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
Feature Selection Techniques
No ratings yet
Feature Selection Techniques
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
117 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Testing The Wisdom of Crowds in The Field: Transfermarkt Valuations and International Soccer Results Thomas Peeters
No ratings yet
Testing The Wisdom of Crowds in The Field: Transfermarkt Valuations and International Soccer Results Thomas Peeters
13 pages
Male
No ratings yet
Male
4 pages
Tijana
No ratings yet
Tijana
12 pages
Forests 16 00164
No ratings yet
Forests 16 00164
32 pages
2014 - Tedersoo Et Al. - Global Diversity and Geography of Soil Fungi
No ratings yet
2014 - Tedersoo Et Al. - Global Diversity and Geography of Soil Fungi
12 pages
Read Me... Pagination - LAND - 104499
No ratings yet
Read Me... Pagination - LAND - 104499
12 pages
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
No ratings yet
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
58 pages
Modeling Neanderthal Clothing
No ratings yet
Modeling Neanderthal Clothing
15 pages
Relative Age Effect 1011
No ratings yet
Relative Age Effect 1011
19 pages
Do Large Language Models Reason Causally Like Us? Even Better?
No ratings yet
Do Large Language Models Reason Causally Like Us? Even Better?
7 pages
S CS2 PaperB PDF
No ratings yet
S CS2 PaperB PDF
13 pages
Cat Bonds Bayesian Approach
No ratings yet
Cat Bonds Bayesian Approach
38 pages
Signal Selector Notes
No ratings yet
Signal Selector Notes
15 pages
ISLR
No ratings yet
ISLR
9 pages
Social Media Lifestyle-1638 PDF
No ratings yet
Social Media Lifestyle-1638 PDF
7 pages
Jmodeltest 0.1.1: (April 2008)
No ratings yet
Jmodeltest 0.1.1: (April 2008)
23 pages
An Introduction To The Package GeoR
No ratings yet
An Introduction To The Package GeoR
17 pages
Beheshitha Et Al. 2016 The Role of Achievement Goal Orientations When Studying Effect of Learning An
No ratings yet
Beheshitha Et Al. 2016 The Role of Achievement Goal Orientations When Studying Effect of Learning An
10 pages
Modeling Traffic Accident Occurrence and Involvement: Mohamed A. Abdel-Aty, A. Essam Radwan
No ratings yet
Modeling Traffic Accident Occurrence and Involvement: Mohamed A. Abdel-Aty, A. Essam Radwan
10 pages
©2016 Stat-Ease, Inc.: What's New in Design-Expert® Version 10
No ratings yet
©2016 Stat-Ease, Inc.: What's New in Design-Expert® Version 10
24 pages
Jurnal 2
No ratings yet
Jurnal 2
15 pages
Nigussie Adam
No ratings yet
Nigussie Adam
96 pages
Dissertacao ModelosPricing AngeloCunha
No ratings yet
Dissertacao ModelosPricing AngeloCunha
105 pages
Bound Test Yuni
No ratings yet
Bound Test Yuni
3 pages
Group F - Time Series Project
No ratings yet
Group F - Time Series Project
14 pages
Indotestudo
No ratings yet
Indotestudo
5 pages
Goodnessnof Fit
No ratings yet
Goodnessnof Fit
6 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages