Gradient Descent: Optimizer Behind Machine Learning

Gradient Descent: Optimizer Behind Machine Learning

Today, we will discuss one of the most important topic in machine learning which is Gradient Descent. gradient descent is a way for our programs to learn and improve by constantly getting better at what they do.

At its core, Gradient Descent is an optimization algorithm used to minimize a loss function by iteratively moving towards the minimum.

Understanding the Intuition

Imagine you are blindfolded on the hill, trying to reach the bottom where the surface is flat. As you are blindfolded you will rely on feeling the slope beneath your feet to guide your steps. This
is like Gradient Descent in machine learning.
In this scenario:

The Hill: it’s a mathematical functions of our model(Loss function)

The Hiker: Model that tries to find the best parameters.

Feeling the Slope: is just like calculating the gradient descent.

Descending: steps taken to reach at the bottom of the hill

Why Gradient Descent over OLS

In our previous discussion, we used the OLS method to find the optimized values of the parameter so why use gradient descent which provides the approximate values of the parameters? While calculating the values of coefficients we need to find the inverse of (X^TX) which can be computationally expensive as for calculating the inverse the time complexity is n^3 which is too much so because of that we use other methods to find or approximate the values of coefficients.

Requirements for Functions

Before delving into Gradient Descent, it’s crucial to understand the requirements for functions that this algorithm can work with.

Differentiability: The function must have a derivative for each point in its domain.

Convexity: A function is convex if it curves upward, like a bowl, and the line segment connecting any points on its graph lies above the graph itself. This ensures that function always increases in one direction and always decreases in another direction.

Continuing with Gradient Descent

Now let’s back to our main topic and talk about Gradient Descent. As mentioned earlier, Gradient Descent is an optimization algorithm used to minimize a function by iteratively moving towards the minimum. Here’s how it works

Basic Idea

Start at a point: Begin at a random point on the function or surface.

Calculate Gradient: calculate the gradient(derivative) of function at that point. The gradient points in the direction of the steepest increase of the function.

Move Opposite to Gradient: Take a small step in the opposite direction of the gradient to decrease the function value.

Update Position: Update the position on the function based on the formula provided below


5. Repeat: Repeat this process until convergence or stopping criteria are met.

Stopping Criteria. when there are no improvements in the values of new parameters, the difference between the old parameters and new parameter values is negligible.

Lets you learn gradient descent by applying to simple linear regression

in this example, we will use a dataset generated with ‘make_regression’ from ‘sklearn’ to create a linear relationship between a single feature and a target variable with some added noise. Our goal is to find the optimal values of the intercept.

The red line in the plot represents the best-fit line calculated using the OLS method. This will be used to distinguish between the OLS and gradient descent.

Gradient Descent

Now, let’s dive into gradient descent, in our example, let’s assume an initial value of slope(m) = 78.35 and intercept(b) = 0. we will apply Gradient Descent to update the intercept(b) and plot the resulting line

we will calculate the slope with a derivative of equation( we will discuss detailed maths in next articles)
loss_slope = -2 * np.sum(y – m * X.ravel() – b)

m = 78.35
b = 0

# Calculate loss slope
loss_slope = -2 * np.sum(y – m * X.ravel() – b)

# Learning rate
lr = 0.1

# Update intercept using Gradient Descent
step_size = loss_slope * lr
b = b – step_size

# Predicted values with updated intercept
y_pred = (m * X + b).reshape(4)

# Plotting
plt.scatter(X, y)
plt.plot(X, reg.predict(X), color=’red’, label=’OLS’)
plt.plot(X, y_pred, color=’#00a65a’, label=’b = {}’.format(b))
plt.xlabel(‘X’)
plt.ylabel(‘y’)
plt.legend()
plt.title(‘Linear Regression with Gradient Descent (Iteration 1)’)
plt.show()

in the plot, the green line represents the linear regression line with the updated intercept(b) using gradient Descent for one iteration. we can see it has started moving towards a better fit compared to the initial OLS line(red).

Iterative Updates

Now, let’s continue the process for multiple iterations to see how the lines move. we will iterate through the Gradient Descent process multiple times, updating the intercept(b) based on the loss

b = -100
m = 78.35
lr = 0.01

epochs = 100

for i in range(epochs):
loss_slope = -2*np.sum(y-m*X.ravel()-b)
b = b-(lr*loss_slope)
y_pred = m*X +b
plt.plot(X,y_pred)
plt.scatter(X,y)

in the final plot, we can see the evolution of linear regression towards the optimal values.

Conclusion

Gradient Descent is a powerful optimization technique used in various machine learning algorithms, especially in cases where the cost function is complex and not easily minimized through analytical methods like OLS. By iteratively updating parameters, such as the intercept (b) and slope (m) in linear regression

Next, we will discuss in detail the mathematics behind this and fill in the gaps that are left out today. Stay tuned

Leave a Reply

Your email address will not be published. Required fields are marked *