Linear Regression Using Batch Gradient Descent
Linear Regression Using Batch Gradient Descent
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
DSP log
No
Google
Home About Blog
(2 votes, Analog
Channel Coding DSP GATE MIMO
I happened to stumble on Prof. Andrew Ngs Machine Learning classes which are available online as part of Stanford Center for Professional Development. The first lecture in the ser data set using linear regression. For understanding this concept, I chose to take data from the top 50 articles of this blog based on the pageviews in the month of September 2011.
Notations
Let be the number of training set (in our case top 50 articles), be the input sequence (the page index), be the output sequence (the page views for each page index) be the number of features/parameters (=2 for our example). The value of corresponds to the training set is defined as :
Let us try to predict the number of page views for a given page index using a hypothesis, where
as,
The scaling by fraction is just for notational convenience. Let us start with some parameter vector , and keep changing the to reduce the cost function , i.e.
1. For each update of the parameter vector , the algorithm process the full training set. This algorithm is called Batch Gradient Descent. 2. For the given example with 50 training sets, the going over the full training set is computationally feasible. However when the training set is very large, we need to use a slight vari We will discuss that in another post.
1 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
The computed .
values are
With this hypotheses, the predicted page views is shown in the red curve (in the below plot). In matlab code snippet, kept the number of step of gradient descent blindly as 10000. One can probably stop the gradient descent when the cost function is small and/or when ra
Couple of things to note : 1. Given that the measured values are showing an exponential trend, trying to fit a straight line does not seem like a good idea. Anyhow, given this is the first post in this series, I let it 2. The value of controls the rate of convergence of the algorithm. If algorithm to diverge. 3. Have not figured how to select Plotting the variation of is very small, the algorithm takes small steps and takes longer time to converge. Higher value of causes t
value suitable (fast convergence) for the data set under consideration. Will figure that out later.
clear; j_theta = zeros(250, 250); % initialize j_theta theta0_vals = linspace(-5000, 5000, 250); theta1_vals = linspace(-200, 200, 250); for i = 1:length(theta0_vals) for j = 1:length(theta1_vals) theta_val_vec = [theta0_vals(i) theta1_vals(j)]'; h_theta = (x*theta_val_vec); j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2); end end figure; surf(theta0_vals, theta1_vals,10*log10(j_theta.')); xlabel('theta_0'); ylabel('theta_1');zlabel('10*log10(Jtheta)'); title('Cost function J(theta)'); figure; contour(theta0_vals,theta1_vals,10*log10(j_theta.')) xlabel('theta_0'); ylabel('theta_1') title('Cost function J(theta)');
2 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
Given that the surface() plot is bit unwieldy in my relatively slow desktop, using contour() plot seems to be a much better choice. Can see that the minima of this cost function lies nea .
Please click here to SUBSCRIBE to newsletter and download the FREE e-Book on probability of error in AWGN. Thanks for visiting! Happy learning.
Related posts:
Least Squares in Gaussian Noise Tagged as: machine_learning D id you like this article? Make sure that you do not miss a new article by subscribing Subscribing via e-mail entitles you to download the free e-Book on BER of BPSK/QPS
Ads by Google
3 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
Bobaker Madi December 30, 2012 at 12:04 am Thanks for this topic I have the same question of Harneet . He meant if we have a random points how can we apply this method. if we sort them in this case we will get exactly a line or curve and ea thanks again for your illustration Reply
Krishna Sankar January 2, 2013 at 6:15 am @Bobaker: Ah, I understand have not played with the random unsorted data. Did you try making this data set random is it converging? I think it will still converge Reply
Paul October 6, 2012 at 5:06 am Hi Krishna, Thank you very much for the article. In your Matlab/Octave code snippet, you have a 1/m factor in the expression for theta_vec. However, in the LaTeX formulae that precede it, this factor is missing. Could you Many thanks, Paul Reply
Krishna Sankar October 6, 2012 at 6:39 am @Paul: Nice observation. While toying with the matlab code, found that the gradient descent is not converging and hence put an additional scaling of 1/m to the error term This 1/m term can be considered to be part of alpha and hence can be absent in the mathematical description. Agree? Reply
Harneet Oberoi November 2, 2011 at 5:47 am Hello, Thanks for sharing this. It is very helpful. My question is that the input and output data you have used is sorted from smallest to largest whereas it follows an exponential distrib let me put it in this way out of input (x) and output (y), only one is sorted and the other is random Would really appreciate your help Thanks Harneet Reply
Krishna Sankar November 7, 2011 at 4:56 am @Harneet: Given my limited exposure to the topic, am unable to understand your query. Can you please put across an example Reply
Deepak October 31, 2011 at 12:38 am thanku for a great article .. sir , i was wondering if u have planned any posts on SNR estimation techniques in the near future Reply Leave a Comment
4 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
{ 5 trackbacks }
Back! Stochastic Gradient Descet Closed form solution for linear regression Least Squares in Gaussian Noise Maximum Likelihood MATLAB: What is an implementation of gradient descent in Matlab? - Quora Previous post: Back! Next post: Stochastic Gradient Descent
Connect with us
Advertisement
Simulation of dynamics
UniversalMechanism.com Simulation of mechanical systems Tracked, road and railway vehicles
Image Rejection Ratio (IMRR) with transmit IQ gain/phase imbalance GATE-2012 ECE Q15 (communication) GATE-2012 ECE Q7 (digital) GATE-2012 ECE Q13 (circuits) GATE-2012 ECE Q34 (signals)
Advertisement
Ads by Google
16-PSK 16-QAM
802.11a
2012 Alamouti AWGN BPSK Capacity Communication conference Digital Diversity ECE eye diagram first order frequency offset FSK GATE Gray IISc interpol
noise Nyquist OFDM PAM pdf phase phase_noise PSK pulse shaping QAM raised cosine Rayleigh SIC STBC TETRA transmitter Viterbi ZF
Ratings
Symbol Error Rate (SER) for QPSK (4-QAM) modulation (5.00 out of 5) BER for BPSK in ISI channel with MMSE equalization (5.00 out of 5) Chi Square Random Variable (5.00 out of 5) Using Toeplitz matrices in MATLAB (5.00 out of 5) IQ imbalance in transmitter (5.00 out of 5) Bit error rate for 16PSK modulation using Gray mapping (5.00 out of 5) Signal to quantization noise in quantized sinusoidal (5.00 out of 5) BER for BPSK in ISI channel with Zero Forcing equalization (5.00 out of 5) About (5.00 out of 5) Negative Frequency (5.00 out of 5)
Categories
Archives
5 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
Comment
Mark on Modeling phase noise (frequency domain approach) Ravinder on Bit Error Rate (BER) for BPSK modulation Krishna Sankar on MIMO with MMSE equalizer Krishna Sankar on Selection Diversity Krishna Sankar on Bit Error Rate (BER) for BPSK modulation Krishna Sankar on Alamouti STBC Krishna Sankar on MIMO with ZF SIC and optimal ordering
Top Rated posts
Bit Error Rate (BER) for BPSK modulation - 52 votes BER for BPSK in Rayleigh channel - 33 votes BER for BPSK in OFDM with Rayleigh multipath channel - 32 votes Alamouti STBC - 29 votes Maximal Ratio Combining (MRC) - 28 votes Download free e-book on error probability in AWGN - 21 votes Understanding an OFDM transmission - 20 votes MIMO with MMSE equalizer - 19 votes MIMO with Zero Forcing equalizer - 19 votes Rayleigh multipath channel model - 18 votes
6 of 7
13/02/2013 13:49
http://www.dsplog.com/2011/10/29/batch-gradient-descent/
dspLog - All rights reserved. Copyright 20072013 No content on this site may be reused in any fashion without written permission from http://www.dspLog.com.
Find us on Facebook
dspLog on
[DSP log] Thoughts on digital signal processing
ANALOG & DSP Complex to Real DSP DesignLine DSP Guide DSPRelated Octave Octave-Forge Online Scientific Calculato
+52
Like 1,411 people like [DSP log] Thoughts on digital signal processing.
7 of 7
13/02/2013 13:49