We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
MCheaL Codeine Regenerocnaleradet-descert pr aman apes? ML Caste
Stochastic Gradient Descent:
Itis an other type ofthe Gradient Descent algorithm that is used for Optimizing Machine Leaming Models iteratively.
addresses the computational inefficiency of Batch Gradient Descent methods when dealing with large datasets
In SGD, instead of using the entre dataset foreach iteration, only a single random example is selected to calculate the
‘gradient and update the model parameters. Ths random selection introduces randomness into the optimization process,
hence the term “stochastic in Stochastic Gradient Descent
Path taken by Stochastic Gradient Descent looks as follows =
orasont
Note: SG0 is noisier than Batch Gradient Descent, and often requiring more iterations to reach the minima due to its
randomness. Despite this, itis computationally more efficient, making ita preferred choice in most scenarios over Batch
Gradient Descent for optimizing learing algorithms
Advantages of Stochastic gradient descent:
In Stochastic gradient descent (S60), lesming happens on every example, and it consists of a few advantages over other
gradient descent
‘+ Memory Efficiency: It updates the parameters for each training example one ata time its memory-efficient and can
easier to allocate large datasets in desired memory.
+ Speed: tis relatively fst to compute than Batch Gradient Descent and Mini-Batch Gradient Descent, because it uses only
psp crnterasad2\M-Che-Codeatchnait near Aagrssintechstcpen-dscrtpn ansag sem M.CheaLCedotinr Regendersocnaleradet-dtcert pr aman apes? IML Cease
fone example to update the parameters.
‘+ Computationally Efficient: By using a single example, the computational cost per iteration i significantly reduced
compared to Batch Gradient Descent methods that require processing the entre dataset,
+ Avoidance of Local Minima: Due to the noisy updates in SGD, it has the ability to escape from local minima and
converges toa global minimum
Let's proceed to build an Approximation Class that will asistus in determining the beta values (coefficients and intercept
using Stochastic Gradient Descent for our Multiple Linear Regression Model. will use the Diabetes dataset to create our own
GDRegressor and validate it against Sklearn’s SGDRegressor.
Importing Dataset
In this implementation am using Diabetes Data of stlearn: https //scikit-
learorg/stable/medules/generated/sklearn datasets load ciabetes html
ps gun crnteasad2\M-Cha- Cadet near Aagresntechstcpe-scrtpr ansag sem MCheaL Codeine Reyenersocnieradet-tcert pr aman apes? ML Castes
‘fron sklearn.datasets Anport oad_diabetes
(Anputs, target) = Load diabetes(return_Xy = True)
print('inputs.shape:' , inputs.shape)
print(‘sarget shape: jtarget.shape)
inputs. shape: (442,10)
tanget.shape: (442,)
Splitting data into train and test datasets
‘fron sklearn.nodel_selection inport train test split
‘erain_inputs, test_inputs, train_target, test target = train_test_split(inputs, target, test_size = 6.2, rando
print(*erain_inputs:' train inputs)
rant (*\n")
print(' onsgn crnteasad2\M-Cha- Codsall near Ragrssntechstcpen-dscrtpr
Micheal Codeine Regenersocnaieradent-dtcert pr aman kapae? ML Cah tee
Note: Data Preprocessing Ensure thatthe data preprocessing steps, such as normalization or standardization, must be
perform. Discrepancies in data processing can impact model convergence,
Here, m not applying Datastandardization because the datasets already in Similar Range of all the axis
Since, Stochastic Gradient Descent requires the value of Learning Rate and Epochs. Iam fst applying the sklearn’'s
‘StochasticGradientDescent for better implementation of our model
‘fron skLearn-Linear_nodel inport scoRegressor
‘from aklearncnetrics inport r2. score
reg = scpRegressor(nax_tter=509, learning rate-' constant eta@=2.03)
reg. f18(train inputs, train target)
scbRegressor(learning_rate='constant', max_iter=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook,
(On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org,
Ypred = reg.pnedict(test_inputs)
print(*sconegressor Coefficiente:", rag.coef_)
print(*\n")
print (*scoRegresson Intercept", reg.intercept_)
Scotegressor Coefficients: [ 53.00997234 -126.73396481 412.23622602 277.2729001-23.54477637
*66, 58596436 -195.90104992 147,23460275 315.68136954 143,74333886]
ScoRegressor Intercept: [148,50841551"
ra score(test_target,y_pred)
9,4508313570919322
Building our own Stochastic Gradient Descent Classsag sem MCheaL Codeine Regenerocnaleradet-descert pr aman apes? ML Caste
Anport ounpy as np
| am using the fst derivative of Mean Squared Error (MSE) for finding convergence in Stochastic Gradient Descent (SGD)
‘algorithm. The first derivative, often referred to as the gradient, indicates the direction and magnitude ofthe steepest ascent
ofthe cost function. By updating the model parameters (coefficients and intercept in the opposite direction of the gradient
aim to minimize the cost function,
Stochastic Gradient Descent Algorithm
1. Initialize Parameters Randomly inialze the parameters of the model Determine the numberof ieraions(epochs) and
the leaming rate for updating the parameters
2. Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the maximur-
numberof iterations: a erate over each randomly selected training example from training dataset to introduce randomness.
b. Conpute the gradient of the cost function with respect to the model parameters using the
current training example.
. Update the model paraneters by taking 2 step in the direction of the negative gradient,
as per the learning rate,
4. Evaluate the convergence criteria, such as the difference in the cost function between
iterations of the gradient.
3. Return Optimized Parameters: Once the convergence criteria are met or the maximum number of iterations is reached,
retum the optimized model parameters.
class stochasticobRegressor()
def _init_(se1f, learning_rate = €.01, epochs = 200)
Half .coet# = hone
self intept = None
self leamning.rate = learning rate
Self epochs = epoens
ps pun crnteasad2\M-Cha- Cadet nar agrssntechstcpet-dscrtipr onsag sem Mi. Chea Cedotinr Regendersocnieradet-dtcert pr aman apes? ML Cease
# creating “fit" Function
ef #2t(self, train_anguts, train target):
1 In Multiple Linear Regression, ‘t {s advisable to choose the starting point of intercept = @ and coef
4 searing with initializing intercept = @
self. intept
staring with initializing coeffictents = 2
self.coeff = np.ones(train_inputs.shape[1]) # Using train inputs. shape[] for the number of features
f starting tteratton Loop
for 1 in range(sel¢-epochs)
for jin range(Zrain_inputs.shapel))
1 Fetching the index rondonty
Sd = nporandon.ranaint(@, train. snputs.shape(0))
# catculating the derivative of intercept values
y.hat = np.cot(train_inputs[idx), self.coeff) + self-intept
intercept derivative = -2 * np.nean(train_target [idk] - y_het)
# Updating att the intercept values
Self-intept = self.intept ~ (Self-learning_rate * intercept derivative)
# calculating the derivative of intercept values
coef#-derivative = -2 * np.dot((train-target dt] ~ y.hat), teain_inputs(iéx])
# Updating alt the intercept values
self coeff = self.coet - (self. learning pate * coeff_derivative)
oroperty
def coeficients( sel):
Af self.coetf $= not None:
‘return seLf.coeff
else!
Dprint( Model now fatted yet.
‘return None
@eroperty
def intercept (self):
Af self.intept is not None:
return self.intept
ease
psp crnteasad21MCha-CatetlcranlnearAgresnontechaste-pe-dscrtipr‘return Hone
creating ‘predict’ Functton
def predict(selt, test inputs)
return np.dot(test inputs, self.coer) + self.intept
9 2 ~ sconing for Metric Evaluation
def scone(self, est_inputs, test target):
predictions = seif-predict(test_inputs)
Fe r2_score(test target, predictions)
spd = Stochast cabegressor(Iearning_atesd.01,¢p0chse500)
sed. f1¢(traln inputs, train_target)
print (*stochasticbtegresser Cocfticients:"
print(*\n")
print (*Stochasticaptegressor Intescept:", sgd.intercept)
|, sad. coot#ictents)
StochasticGoRegressor Coefficients: [ 43.00297577 -237.60014313 62.78410347 333.84767943 -125. 88091605
106,29575328 -205.01352308 153,03909436 423,57085728 55.00051245]
StochasticGokegressor Intercept 247.41704853952255
Additionally, using the R2 score asa regression metric isa good choice for evaluating the performance of model, R2 measures
the proportion ofthe variance inthe dependent variable thats predictable from the independent variables. higher R2 score
indicates better predictive performance,
2 = sgé.score(test_inputs, test_target)
print(R2 score on test data: (2)")
2 score on test data: 0,44582679448963036
The slight difference in performance between sklearn's SGDRegressor madel and my custom Stochastic Gradient Descent
ps pun crnteasad2\M-Cha- Cadet near Aagssntechstcpen-dscrtipr enMCteaLCedotinar Regenersocnaleradet-tcrt pr aman apes? ML Cah tee
(S60) Class implementation can be attributed to several factors:
1. Hyperparameter Tuning: The performance of SGORegresso in sklearn may be influenced by default hyperparameter
settings, optimized for specific large datasets.
2. Random Initialization: My custom SGD implementation employs a random index for gradient descent updates
introducing randomness that can lea to different convergence paths and variations in final model parameters.
3. Convergence Criteria: Differences in the number of epochs and convergence criteria could contribute to performance
variations between the two implementations
4, Learning Rate Schedule: sklearn's SGDRegressor utilizes a default learning rate schedule, while | have not experimentee
with various leaming rate schedules in my custom class,
5. Regularization: skearn's SGDRegressor may include regularization terms by default, whereas my custom class does not
currently implement any form of regularization,
By systematically evaluating these factors, you can identify the specific reasons behind the performance differences and refine
your custom SGD implementation accordingly,
Disadvantages of Stochastic Gradient Descent (SGD)
1. Noisy Updates: The updates in SGD are noisy with high variance, which can make the optimization process less stable anc
Potentially leads to oscillations around the minimum.
2. low Convergence: Convergence in SGO may be slower as it updates parameters foreach traning example individually
requifng more iterations to reach the minima,
3. Sensitivity to Learning Rate: The choice of leaming rate is crucial in SGD. A high rate may cause overshooting, while alow
rate can result in slow convergence, impacting the algorithm's performance,
Teolow Just ight Too high
10) 10
1 SS
406)
ps pun crnteasad2}M-chan- Cadena near Aagrssntechstcpen-dscrtipr onTy SS = oem
4
Asmalllearing rate The optimal learning
Two lage ofa learning rate
requires many updates ‘ate swiftly reaches the causes drastic updates
ore ect ‘minimum point hich lead te divergent
“behaviors
4. Less Accuracy: The noisy updates may prevent SGD from converging to the exact global minimum, yielding suboptima
solutions. Techniques like leaming rate scheduling and momentum-based updates can help mitigate this issue,
Difference between Batch Gradient Descent and Stochastic Gradient
Descent
‘elaine cape changed ag i "reais yay.
nepeal nga ts gant rin cose reo Ey cterpin al aba ae
Note: have built a custom class facilitate a bette understanding f Stochastic Gradient Descent. Consequently, | would
recommend utilizing the scikit-lear library forthe development of your model
‘Say tuned for Polynomial Regression and Dont forget te Star this Github Repository for more such contents
and consider sharing with others
ps pun crnterasad2\M-Cha- Cadet near Ragesintechstcpedscrtipr son