Gradient Descent for Multiple Linear Regression

University
Stanford University
Course
CS229 | Machine Learning
Pages

1

Academic year

2023
Author

anon
Views

43

p {margin: 0; padding: 0;} .ft00{font-size:25px;font-family:TimesNewRomanPS;color:#000000;} .ft01{font-size:18px;font-family:TimesNewRomanPSMT;color:#000000;} .ft02{font-size:22px;font-family:TimesNewRomanPSMT;color:#000000;} .ft03{font-size:18px;font-family:CambriaMath;color:#000000;} .ft04{font-size:12px;font-family:CambriaMath;color:#000000;} .ft05{font-size:9px;font-family:CambriaMath;color:#000000;} .ft06{font-size:25px;line-height:33px;font-family:TimesNewRomanPS;color:#000000;} .ft07{font-size:18px;line-height:23px;font-family:TimesNewRomanPSMT;color:#000000;} .ft08{font-size:18px;line-height:27px;font-family:TimesNewRomanPSMT;color:#000000;} Vectorization-based Gradient Descent for Multiple LinearRegression In machine learning, multiple linear regression is a potent technique that lets us simulate therelationship between a dependent variable and a number of independent factors. We willdelve deep into the application of gradient descent for multiple linear regression withvectorization in this post. We shall comprehend this potent method better by fusing the ideasof gradient descent and multiple linear regression. Multiple linear regression analysis A dependent variable and several independent variables are modeled using the statisticaltechnique known as multiple linear regression. In this model, we are attempting to predict acontinuous variable called the dependent variable using the values of the independentvariables. We will group all of the independent variables into a vector w and the dependent variable intoa number b in order to write this model more concisely using vector notation. The equationfor the model is . 𝑓 𝑤,𝑏 (𝑥) = 𝑤 1 𝑥 1 +... + 𝑤 𝑛 𝑥 𝑛 + 𝑏 Function Cost The cost function, J(w,b), calculates the discrepancy between the dependent variable's actualvalue and its expected value. By updating the parameters w and b until the cost functionachieves a minimum, the gradient descent algorithm seeks to minimize the cost function. Descent in Gradient The cost function is minimized using the optimization process known as gradient descent.Gradient descent involves updating each w and b parameter repeatedly until the cost functionis minimized. The gradient descent update rule is given by and 𝑤 𝑗 = 𝑤 𝑗 − α ∂ ∂𝑤 𝑗 𝐽(𝑤, 𝑏) , where alpha is the learning rate, J(w,b) is the cost function's 𝑏 = 𝑏 − α ∂ ∂𝑏 𝐽(𝑤, 𝑏) derivative with respect to , and J(b) is the cost function's derivative with respect to b. 𝑤 𝑗 Multiple Regression using Gradient Descent With I = 1, 2,..., n, the gradient descent updating rule for multiple linear regression is , where n is the total number of independent variables. 𝑤 𝑗 = 𝑤 𝑗 − α ∂ ∂𝑤 𝑗 𝐽(𝑤, 𝑏) , where f(w,b) is the predicted value, y is the actual value, and x i is the j-th 𝑓(𝑤, 𝑏) − 𝑦 · 𝑥 𝑗 independent variable, is the formula for the derivative of J with respect to . 𝑤 𝑗