0% found this document useful (0 votes)

22 views10 pages

DSC 190 Final Report

example

Uploaded by

Hường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

DSC 190 Final Report

example

Uploaded by

Hường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DSC 190 - Introduction to Data Mining: Final Report

JERRY CHAN, University of California San Diego

1 INTRODUCTION
In this project, I implement eight machine learning models: Linear Regression, Logistic Regression, Naive Bayes,
K-Means, Gaussian Mixture, Decision Tree, Random Forest, and Matrix Factorization. For each model, I fit them with
real-world dataset and explore how different parameter sets effects the result of categorization, regression, and clustering
tasks. In this report, I’ll briefly explain the algorithms and implementation details and the models performance.

2 LINEAR REGRESSION
2.1 Introduction
Linear Regression is a simple regression model that aims to find the linear relationship between variables in the given
data. In this section, I implemented a plain linear regression model without any regularization. There are two ways to
fit the model: inverting the input matrix and gradient descent. I’m going to use gradient descent as the starter code
suggested. Mathematical Interpretation:
𝑌𝑖 = 𝜃 0 + 𝜃 1𝑥 1 + 𝜃 2𝑥 2 + ... (1)
𝑌𝑖 is the target variable, 𝑥’s are the features, and 𝜃 ’s is the weight for each features. The goal of the model is finding
those 𝜃 ’s.

2.2 Algorithm
Begin with the input data, if a intercept is required (the input is not centered at origin) we append 1 to each column.
The weight assigned to this column will be the intercept. The model initial the weight with a uniform distribution from
-1 to 1. Then it runs gradient descent for multiple iterations or until early stopping is triggered ( when the error rate
decreasing rate is small). Following is the equation for gradient of Loss (MSE) w.r.t. weights:

𝜕𝐿 −2 𝑇
= 𝑋 (𝑦® − 𝑋𝑊 ) (2)
𝜕𝑊 𝑛
Where 𝐿 is the loss, 𝑛 is the number of samples, 𝑋 is the input matrix, 𝑦 is the target output, and 𝑊 is the weight matrix.
The weight is updated by subtracting the product of learning rate and the gradient. The update is made only if the error
rate will decreased. For learning rate, I’m using an adaptive learning rate, which decrease when the update doesn’t
decrease the learning rate and increase if it does.

2.3 Performance
In this section, I test the model’s performance on the wine-quality dataset. The task is predicting the wine quality
(integer 3 8) by density and alcohol concentration. I run the model with and without min max normalization for 100000
iteration each and compare the result. The following graph is the Loss vs Number of Iteration plot for the first 100
iterations. I choose to only show the first 100 iteration is because (1) the initial loss is too large to see any trend for the
1
2 Jerry Chan

later loss (2) the change of loss in later iteration is very small. The loss is adjusted to the not normalized loss so they
can be compared.

(a) Without Normalization (b) With Normalization

Fig. 1. Training Error of Model with Normalization vs without Normalization

Table 1. Performance Comparison

Model Loss
Sklearn Benchmark 800.6677
Without normalization 805.6663
With normalization 800.6678

The model trained with normalized data converge faster and start has a better accuracy. From the gradient descend
formula, we can see that features with smaller scale gets a smaller gradient. Since all weights share the same learning
rate, weights are tuned unevenly, making the model hard to train.

3 LOGISTIC REGRESSION
3.1 Introduction
Logistic regression is a model for binary classification. It is composed of a linear regression model and a logistic function.
The logistic function takes the output of the linear function and outputs a prediction between 0 and 1. Mathematical
interpretation:
1
𝑌 = (3)
1 + 𝑒 −(𝑋𝑊 +𝑏)
𝑌 is the prediction, 𝑋 is the input, 𝑊 is the weights, and 𝑏 is the bias.

3.2 Algorithm
Similar as linear regression, I use gradient descend to train the model. The gradient is calculated as follow:

𝜕𝐿 𝑦ˆ − 𝑦
= (4)
𝜕𝑦 𝑛
ˆ output, 𝑦: target
𝑦:
DSC 190 - Introduction to Data Mining: Final Report 3

𝜕𝐿 𝜕𝐿
= ˆ
𝜎 (𝑦)(1 ˆ
− 𝜎 (𝑦)) (5)
𝜕𝑍 𝜕𝑦
𝜎: logistic function, 𝑍 : input to the logistic function

𝜕𝐿 𝜕𝐿
= 𝑋𝑇 (6)
𝜕𝑊 𝜕𝑍
𝜕𝐿 𝜕𝐿
= (7)
𝜕𝑏 𝜕𝑍

3.3 Performance
I continued to use the wine quality dataset. I extract the wine that has a quality rating of 5 or 6. The task is to predict
whether the quality rating is 5 or 6 on this subset of data using all the features (acidity, dioxide, density, etc.). The result
is measured by cross entropy and train the model for 100000 iterations.
The final training loss is and the final training accuracy is .

4 NAIVE BAYES
4.1 Introduction
Naive Bayes is a classification model using probabilistic approach. It uses the assumption that all features are conditional
independent.

4.2 Algorithm
In training set, calculate all 𝑃 (𝑦) and 𝑃 (𝑥𝑖 |𝑦) for all category 𝑦s and features 𝑥𝑖 from training data. In prediction step, it
approximate

𝑃 (𝑦|𝑥 1, 𝑥 2, ...) (8)

with

𝑃 (𝑦|𝑥 1, 𝑥 2, ...) ∝ 𝑃 (𝑦, 𝑥 1, 𝑥 2, ...)

∝ 𝑃 (𝑦)𝑃 (𝑥 1 |𝑦)𝑃 (𝑥 2 |𝑦)... (9)
∝ 𝑃 (𝑦)Π𝑛𝑖=1 𝑃 (𝑥𝑖 |𝑦)

4.3 Performance
The dataset I used for this section is the balance scale dataset. The data is classified as tip to the right, tip to the left, or
be balanced. The features are the left weight, the left distance, the right weight, and the right distance. Below is the
comparison between the Naive Bayes model from Sklearn (CategoricalNB) and our model.
My model achieve the same accuracy as the one from Sklearn, meaning my implementation is correct.
4 Jerry Chan

Table 2. Performance Comparison

Model Test Accuracy

Sklearn Benchmark 0.884
Our model 0.884

5 K-MEANS
5.1 Introduction
K mean is a clustering algorithm that takes in a k and splits the given data points into k clusters. The model aims to
minimize the in cluster sum of squares.

5.2 Algorithm
The model is initialized by randomly select k data points as "centroids". We then fit the model to the dataset by repeating
the following steps:
1. Assign each data points to the nearest centroid. The points assigned to the same centroid is now a cluster.
2. Calculate the mean of the data points in each clusters.
3. If the mean equals to the centroid, the model is fitted, else make the means the new centroids.

5.3 Performance
I perform k-means on the Iris dataset. The dataset contains three kinds of Iris and I select ’petal length (cm)’ and ’petal
width (cm)’ to be the only features used by the model. The figure below is the visualization of the training process.
It took the model 11 iteration to converge. The centroid of the green cluster shifted upward as the training proceeds,
leading to more reasonable clusters.

(a) Iteration 1 (b) Iteration 5 (c) Iteration 11

Fig. 2. Training Process of K-mean Model

K-means model also works on higher dimension data space. I repeat the same task with one more feature: ’petal
length (cm)’. The following plot is the result:

6 GAUSSIAN MIXTURE
6.1 Introduction
Gaussian Mixture is a "soft" version of K-means algorithm. Each cluster is a Gaussian distribution. For each data point,
we calculate the probability of it being generated by each clusters. Then, it’s assigned to the cluster which its most
DSC 190 - Introduction to Data Mining: Final Report 5

Fig. 3. Result of 3 Dimension K-means

likely to be in. Then we update the parameter for the distribution. It’s a soft clustering algorithm as it gives each data
point a probability distribution instead of a hard label.

6.2 Algorithm
1. Decide the initial centroids with K-means algorithm. 2. Calculate the posterior probability. 3. Find the parameters for
each cluster’s distribution. 4. Repeat 2 3 for a set amount of time.

6.3 Performance
In this section, we test our Gaussian Mixture model with simulated dataset generated by "Sklearn.dataset.make_blobs".
I set the cluster standard deviation to 2, number of cluster to 3, and the number of feature (the dimension of the feature
space) to 3. I generate 500 samples with the function and the following plot shows the 3 dimension scatter plot of the
data.

Fig. 4. Simulated Data for Testing Gaussian Mixture Model

I fit the data to my Gaussian Mixture model and trained it for 100 epochs. I label the data to the cluster with max
likelihood. The result is shown as the figure below:
6 Jerry Chan

Fig. 5. Clustering Result of Gaussian Mixture Model

We can see that our model make a pretty reasonable clustering result. To further confirm my implementation is
correct, I ran the same test on sklearn model "sklearn.mixture.GaussianMixture". The following table is the clusters
produced by each model:

Cluster Feature 1 Feature 2 Feature 3 Cluster Feature 1 Feature 2 Feature 3

1 -1.734 8.038 8.216 1 -1.741 8.041 8.223
2 0.143 -9.169 -8.314 2 0.143 -9.169 -8.314
3 4.705 5.265 2.924 3 4.696 5.269 2.930
Table 3. Sklearn (left) vs My Model (right): Cluster Means

The mean absolute difference of weighted log probabilities for each sample between the result of the two models is
0.03. Based on the comparisons, we can conclude that the results are identical.

7 DECISION TREE
7.1 Introduction
Decision tree is a tree like model. The tree is built by splitting the training data to maximize the information gain
of each split. The model predict the output by observing some features of the input to decide which leaf the input
belongs and make the prediction based on the training data in that leaf. In this section, we are going to implement a
classification decision tree.

7.2 Algorithm
7.2.1 Train.
1. List all the splitting rule and select random subset of features and rules (the number of using features and rules is
controlled by the user)
2. Find the splitting rule with the highest information gain
3. Record the splitting rule and split the data correspondingly
4. Determine if the split is valid ( check if the tree exceed the depth limit limit, if the size of the spitted data is greater
DSC 190 - Introduction to Data Mining: Final Report 7

than size limit)

5. If the split is valid, built the two child nodes with those data, else calculate the mode of the data label as the prediction
of the node and mark it as a leaf node.
7.2.2 Predict.
1. start from the root, follow the rule of each node until reaches the leaf node
2. return the prediction of the leaf node
7.2.3 Information Gain.
We use the information gain to measure the quality of the split. Information gain equals to the cross entropy of the
parent node (before the split) and the weighted (by the size of child node) mean of the cross entropy of the child node
(after the split). The higher the information gain is, the better the split is.

7.3 Performance
I test the model using wine quality dataset. The dataset contains 1599 samples of wine properties, such as acidity and
density, and their quality ratings, integer from 3 to 8. The task is to predict the quality rating by the properties. I trained
the model with 80 percent of the data and tested it on the other 20 percent. I calculated the training and testing accuracy
and compare the model’s performance with the decision tree in sklearn package ( sklearn.tree.DecisionTreeClassifier).
The sklearn model is set The result is set to use entropy as splitting criterion, and rest of the settings are default. The
result is shown in the table below:

Model Train Accuracy Test Accuracy

Sklearn Benchmark 1.000 0.6281
My Model 1.000 0.6281
Table 4. Performance Comparison - Decision Tree

The train accuracy and test accuracy of both model are the same, proving my implementation is correct. We can see
that both model achieve perfect accuracy on training set. This is because we didn’t give really loose restriction on node
size (2) and tree depth (1000). The tree can split the data until they are fully separated and overfitting to the training
data. This causes both model to have a very low test accuracy of 0.6.

8 RANDOM FOREST
8.1 Introduction
Random forest is a ensemble model using the bagging method to combine multiple independent decision trees. The
model is widely use in prediction task as it requires very little configuration.

8.2 Algorithm
The model build multiple decision trees. Each tree are limited to random subset of splitting rules on every split, resulting
different tree in each built. The model generated its prediction by calculating the mean or mode of all the decision trees’
predictions.
8 Jerry Chan

8.3 Performance
I perform the same test on the wine dataset as I did in the decision tree section. I compared the outcome with the random
forest model from sklearn (sklearn.ensemble.RandomForestClassifier). For my model, I set the number of splitting rules
and number of features accessible in each split to the square root of the total number of rules / features. This prevent
the model from overfitting to the training data and generate unique decision trees. For the sklearn model, I set the
splitting criterion to "entropy" to match the criterion used by my model and keep other settings as default. I trained 100
decision trees for each model. The result is shown as the table below:

Model Train Accuracy Test Accuracy

Sklearn Benchmark 1.000 0.7031
My Model 1.000 0.7125
Table 5. Performance Comparison - Random Forest

Both model achieved similar accuracy, proving my implementation is correct.

9 MATRIX FACTORIZATION
9.1 Introduction
Matrix factorization is a recommend system algorithm that generate latent features of the ranking matrix. The model is
trained by gradient descend.

9.2 Algorithm
Let the number of feature in the latent space be 𝑙. The goal of this algorithm is to generate a latent feature matrix 𝑃 and
a matrix 𝑄 such that rating matrix 𝑅 = 𝑃𝑄𝑇 . The shape of R is (𝑛𝑢𝑚𝑏𝑒𝑟𝑜 𝑓 𝑢𝑠𝑒𝑟𝑠, 𝑛𝑢𝑚𝑏𝑒𝑟𝑜 𝑓 𝑖𝑡𝑒𝑚𝑠). The shape of 𝑃 is
(𝑛𝑢𝑚𝑏𝑒𝑟𝑜 𝑓 𝑢𝑠𝑒𝑟𝑠, 𝑙). The shape of 𝑄 is (𝑛𝑢𝑚𝑏𝑒𝑟𝑜 𝑓 𝑖𝑡𝑒𝑚𝑠, 𝑙). 𝑃 and 𝑄 are derived by gradient descend. Given learning
rate 𝛼 and normalization parameter 𝜆 for each observation (user 𝑢 gives item 𝑖 rating 𝑅𝑢,𝑖 ) we update P and Q follows
the equation below:

ˆ = Σ𝑙 𝑃𝑢,𝑘 ∗ 𝑄𝑖,𝑘
𝑅𝑢,𝑖 (10)
𝑘=1

ˆ
𝑒 = 𝑅𝑢,𝑖 − 𝑅𝑢,𝑖 (11)

𝑃𝑢 + = 2 ∗ 𝛼 ∗ 𝑒 ∗ 𝑄𝑖 − 𝜆 ∗ 𝑃𝑢 (12)

𝑄𝑖 + = 2 ∗ 𝛼 ∗ 𝑒 ∗ 𝑃𝑢 − 𝜆 ∗ 𝑄𝑖 (13)
The model can have a bias 𝑏𝑢 for each users, a bias 𝑏𝑖 for each items and a global bias 𝑏, which is set to the overall
mean rating. We prediction the rating with:

ˆ = Σ𝑙 𝑃𝑢,𝑘 ∗ 𝑄𝑖,𝑘 + 𝑏𝑢 + 𝑏𝑖 + 𝑏
𝑅𝑢,𝑖 (14)
𝑘=1
We update the bias with the following equations:
DSC 190 - Introduction to Data Mining: Final Report 9

𝑏𝑢 + = 2 ∗ 𝛼 ∗ 𝑒 − 𝜆 ∗ 𝑏𝑢 (15)

𝑏𝑖 + = 2 ∗ 𝛼 ∗ 𝑒 − 𝜆 ∗ 𝑏𝑖 (16)
The model also uses adaptive learning rate as in the linear regression section.

9.3 Performance
The model is tested with the MovieLens Latest Datasets. The dataset contains 100,000 ratings and 3,600 tag applications
applied to 9,000 movies by 600 users. The dense rating matrix is transformed to a list of tuples each contains a user id, a
movie id, and a rating. The data is splitted into train set and test set with a 8:2 ratio. I tuned the model with and without
bias terms. The best set of hyperparameters and the result is shown below:

Learning Rate Normalization Number of Iterations Number of Latent Features

5e-4 6e-3 50 5
Table 6. Hyperparameters for Matrix Factorization

Result:

Model Test RMSE

without bias 1.080
with bias 0.936
Table 7. Result for Matrix Factorization

Error vs iteration plot:

(a) Without bias (b) With bias

Fig. 6. Training Error of Model Without Bias vs Without Bias

The model with bias has a better performance and converges faster. The errors are under the provided benchmark (
1.1 for model without bias and 1.0 for model with bias).
10 Jerry Chan

A SOURCE CODE
• Linear Regression - Linear_Regression.ipynb
• Logistic Regression - Logistic_Regression.ipynb
• Naive Bayes - Naive_Bayes.ipynb
• K Means - KMeans.ipynb
• Gaussian Mixture - KMeans.ipynb
• Decision Tree - DecisionTree&RandomForest.ipynb
• Random Forest - DecisionTree&RandomForest.ipynb
• Matrix Factorization - Matrix_Factorization.ipynb

B DATASETS
• Wine-Quality Dataset: https://archive.ics.uci.edu/ml/datasets/wine+quality
• Balance Scale Dataset: https://archive.ics.uci.edu/ml/datasets/balance+scale
• MovieLens Latest Dataset: https://grouplens.org/datasets/movielens/
• Iris Dataset: https://archive.ics.uci.edu/ml/datasets/iris

What Is The Role of Students in Online Courses?
100% (1)
What Is The Role of Students in Online Courses?
4 pages
Lesson 4 Philippine Forest and Wildlife Resources: By: For. Leslie Sanchez Obiso CTU-Barili
No ratings yet
Lesson 4 Philippine Forest and Wildlife Resources: By: For. Leslie Sanchez Obiso CTU-Barili
12 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
Session 5
No ratings yet
Session 5
36 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Machine Learning AND Predictive Modeling: Rabi Kulshi
No ratings yet
Machine Learning AND Predictive Modeling: Rabi Kulshi
24 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
Clustering For Clasification
No ratings yet
Clustering For Clasification
13 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
K Means & Polynomial
No ratings yet
K Means & Polynomial
5 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Probability and Statistics Mansoura Day4
No ratings yet
Probability and Statistics Mansoura Day4
23 pages
Module 3
No ratings yet
Module 3
132 pages
Module 3 - Classification
No ratings yet
Module 3 - Classification
9 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
AI Algoritm Course
No ratings yet
AI Algoritm Course
19 pages
g (y) = βo + β (Age) - (a)
No ratings yet
g (y) = βo + β (Age) - (a)
6 pages
Model Definition
No ratings yet
Model Definition
6 pages
Model Definition11
No ratings yet
Model Definition11
6 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
Business Analytics 1 Ca 2
No ratings yet
Business Analytics 1 Ca 2
26 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
AnIntroductiontoMachineLearning - Thebook
No ratings yet
AnIntroductiontoMachineLearning - Thebook
234 pages
Classification
No ratings yet
Classification
50 pages
5 Markd
No ratings yet
5 Markd
24 pages
Chapter 3. Machine Learning - Full
No ratings yet
Chapter 3. Machine Learning - Full
18 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
5 No Ans.
No ratings yet
5 No Ans.
38 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Thebook PDF
No ratings yet
Thebook PDF
234 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Assignment
No ratings yet
Assignment
14 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Chapter
100% (1)
Chapter
101 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Topic: Machine Learning
No ratings yet
Topic: Machine Learning
35 pages
ML U-3
No ratings yet
ML U-3
16 pages
Linear Regression Example
No ratings yet
Linear Regression Example
28 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Set 4 QP CLASS 11
No ratings yet
Set 4 QP CLASS 11
15 pages
8 Followership 1 Prefi
No ratings yet
8 Followership 1 Prefi
3 pages
As Phy Revision BK For Mid Term PDF
No ratings yet
As Phy Revision BK For Mid Term PDF
10 pages
Quantum Physics For Babies
No ratings yet
Quantum Physics For Babies
13 pages
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
No ratings yet
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
27 pages
ASTM D 422-63 (Reapproved 2002)
No ratings yet
ASTM D 422-63 (Reapproved 2002)
8 pages
A Hole in Space (1974) by Larry Niven PDF
No ratings yet
A Hole in Space (1974) by Larry Niven PDF
212 pages
HD5208 FLX: High Density Polyethylene
No ratings yet
HD5208 FLX: High Density Polyethylene
2 pages
Summary Completion Ielts Reading
No ratings yet
Summary Completion Ielts Reading
8 pages
Friday 5 June 2020: Physics
No ratings yet
Friday 5 June 2020: Physics
16 pages
Dissertation Sahra Wagenknecht
100% (2)
Dissertation Sahra Wagenknecht
8 pages
Final Thermodynamics
No ratings yet
Final Thermodynamics
6 pages
Holiday Homework
No ratings yet
Holiday Homework
16 pages
Research
No ratings yet
Research
15 pages
The Role of Ritucharya in Human Body According To Different Ritu'S
No ratings yet
The Role of Ritucharya in Human Body According To Different Ritu'S
4 pages
Baker 2 Phase Flow
0% (1)
Baker 2 Phase Flow
2 pages
Citation Ken20 /L 1033
No ratings yet
Citation Ken20 /L 1033
2 pages
ĐỀ THI THỬ SỐ 9 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 9 - Khóa Đề
6 pages
MTH101 Final Term Solved Subjective Lecture 23 To 45
No ratings yet
MTH101 Final Term Solved Subjective Lecture 23 To 45
43 pages
Perth 2014 - Abstract Book - Final PDF
100% (1)
Perth 2014 - Abstract Book - Final PDF
277 pages
300+ (MOSK ASKED) L&T Civil Engineer Interview Questions and Answers
No ratings yet
300+ (MOSK ASKED) L&T Civil Engineer Interview Questions and Answers
10 pages
Heink 2017
No ratings yet
Heink 2017
9 pages
SSC Geography
No ratings yet
SSC Geography
3 pages
3 General English&legal Language-Law, Sem-1 Syllabus
No ratings yet
3 General English&legal Language-Law, Sem-1 Syllabus
1 page
Curriculum Development A Summary
No ratings yet
Curriculum Development A Summary
22 pages
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
No ratings yet
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
15 pages
Fundamentals of Fluid Mechanics
No ratings yet
Fundamentals of Fluid Mechanics
1 page
Documentation MuRAT
No ratings yet
Documentation MuRAT
76 pages

DSC 190 Final Report

Uploaded by

DSC 190 Final Report

Uploaded by

DSC 190 - Introduction to Data Mining: Final Report

JERRY CHAN, University of California San Diego

(a) Without Normalization (b) With Normalization

Fig. 1. Training Error of Model with Normalization vs without Normalization

Table 1. Performance Comparison

𝑃 (𝑦|𝑥 1, 𝑥 2, ...) (8)

𝑃 (𝑦|𝑥 1, 𝑥 2, ...) ∝ 𝑃 (𝑦, 𝑥 1, 𝑥 2, ...)

Table 2. Performance Comparison

Model Test Accuracy

(a) Iteration 1 (b) Iteration 5 (c) Iteration 11

Fig. 2. Training Process of K-mean Model

Fig. 3. Result of 3 Dimension K-means

Fig. 4. Simulated Data for Testing Gaussian Mixture Model

Fig. 5. Clustering Result of Gaussian Mixture Model

Cluster Feature 1 Feature 2 Feature 3 Cluster Feature 1 Feature 2 Feature 3

than size limit)

Model Train Accuracy Test Accuracy

Model Train Accuracy Test Accuracy

Both model achieved similar accuracy, proving my implementation is correct.

Learning Rate Normalization Number of Iterations Number of Latent Features

Model Test RMSE

Error vs iteration plot:

(a) Without bias (b) With bias

Fig. 6. Training Error of Model Without Bias vs Without Bias

You might also like