---
title: "Gradient Descent Visualizer"
subtitle: "LAB02: Machine Learning Foundations"
format:
html:
code-fold: true
---
## Interactive 3D Loss Surface
This simulation visualizes how gradient descent navigates a loss surface to find optimal parameters.
::: {.callout-note}
## Concept from LAB02
See **Section 2.3: Optimization** in the [ PDF book ](../downloads/Edge-Analytics-Lab-Book-v1.0.0.pdf) for the mathematical foundations.
:::
## The Visualization
```{python}
#| label: fig-gradient-descent
#| fig-cap: "Gradient descent on a 2D loss surface"
#| code-fold: true
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Create loss surface
def loss_function(x, y):
return x** 2 + y** 2 + 0.5 * np.sin(3 * x) + 0.5 * np.cos(3 * y)
# Generate surface data
x = np.linspace(- 2 , 2 , 100 )
y = np.linspace(- 2 , 2 , 100 )
X, Y = np.meshgrid(x, y)
Z = loss_function(X, Y)
# Gradient descent path
def gradient(x, y):
dx = 2 * x + 1.5 * np.cos(3 * x)
dy = 2 * y - 1.5 * np.sin(3 * y)
return dx, dy
# Run gradient descent
path_x, path_y, path_z = [1.5 ], [1.5 ], [loss_function(1.5 , 1.5 )]
lr = 0.1
for _ in range (50 ):
dx, dy = gradient(path_x[- 1 ], path_y[- 1 ])
new_x = path_x[- 1 ] - lr * dx
new_y = path_y[- 1 ] - lr * dy
path_x.append(new_x)
path_y.append(new_y)
path_z.append(loss_function(new_x, new_y))
# Create figure
fig = plt.figure(figsize= (12 , 5 ))
# 3D surface plot
ax1 = fig.add_subplot(121 , projection= '3d' )
ax1.plot_surface(X, Y, Z, cmap= 'viridis' , alpha= 0.7 , edgecolor= 'none' )
ax1.plot(path_x, path_y, path_z, 'r.-' , linewidth= 2 , markersize= 8 , label= 'GD path' )
ax1.scatter([path_x[0 ]], [path_y[0 ]], [path_z[0 ]], color= 'green' , s= 100 , label= 'Start' )
ax1.scatter([path_x[- 1 ]], [path_y[- 1 ]], [path_z[- 1 ]], color= 'red' , s= 100 , label= 'End' )
ax1.set_xlabel('Parameter 1' )
ax1.set_ylabel('Parameter 2' )
ax1.set_zlabel('Loss' )
ax1.set_title('3D Loss Surface' )
ax1.legend()
# Contour plot
ax2 = fig.add_subplot(122 )
contour = ax2.contour(X, Y, Z, levels= 20 , cmap= 'viridis' )
ax2.plot(path_x, path_y, 'r.-' , linewidth= 2 , markersize= 8 )
ax2.scatter([path_x[0 ]], [path_y[0 ]], color= 'green' , s= 100 , zorder= 5 , label= 'Start' )
ax2.scatter([path_x[- 1 ]], [path_y[- 1 ]], color= 'red' , s= 100 , zorder= 5 , label= 'End' )
ax2.set_xlabel('Parameter 1' )
ax2.set_ylabel('Parameter 2' )
ax2.set_title('Contour View' )
ax2.legend()
plt.colorbar(contour, ax= ax2, label= 'Loss' )
plt.tight_layout()
plt.show()
```
## Understanding the Visualization
### Loss Surface
The colored surface represents the **loss function** $L(\theta_1, \theta_2)$. Lower values (darker colors) indicate better parameter combinations.
### Gradient Descent Path
The red dots show the path taken by gradient descent:
1. **Start** (green): Initial random parameters
2. **Steps**: Each step moves in the direction of steepest descent
3. **End** (red): Final parameters after convergence
### The Update Rule
At each step, parameters update according to:
$$\theta_{t+1} = \theta_t - \eta \nabla L(\theta_t)$$
where:
- $\eta$ is the **learning rate** (controls step size)
- $\nabla L$ is the **gradient** (direction of steepest ascent)
## Experiment: Learning Rate
```{python}
#| label: fig-learning-rates
#| fig-cap: "Effect of different learning rates"
#| code-fold: true
fig, axes = plt.subplots(1 , 3 , figsize= (15 , 4 ))
learning_rates = [0.01 , 0.1 , 0.5 ]
titles = ['Too Small (0.01)' , 'Good (0.1)' , 'Too Large (0.5)' ]
for ax, lr, title in zip (axes, learning_rates, titles):
# Run gradient descent
path_x, path_y = [1.5 ], [1.5 ]
for _ in range (50 ):
dx, dy = gradient(path_x[- 1 ], path_y[- 1 ])
new_x = path_x[- 1 ] - lr * dx
new_y = path_y[- 1 ] - lr * dy
# Clip to prevent explosion
new_x = np.clip(new_x, - 3 , 3 )
new_y = np.clip(new_y, - 3 , 3 )
path_x.append(new_x)
path_y.append(new_y)
# Plot
ax.contour(X, Y, Z, levels= 20 , cmap= 'viridis' , alpha= 0.7 )
ax.plot(path_x, path_y, 'r.-' , linewidth= 1 , markersize= 4 )
ax.scatter([path_x[0 ]], [path_y[0 ]], color= 'green' , s= 100 , zorder= 5 )
ax.scatter([path_x[- 1 ]], [path_y[- 1 ]], color= 'red' , s= 100 , zorder= 5 )
ax.set_title(title)
ax.set_xlabel('θ₁' )
ax.set_ylabel('θ₂' )
ax.set_xlim(- 2.5 , 2.5 )
ax.set_ylim(- 2.5 , 2.5 )
plt.tight_layout()
plt.show()
```
### Observations
| Learning Rate | Behavior |
|---------------|----------|
| **Too small** (0.01) | Slow convergence, may not reach minimum |
| **Good** (0.1) | Smooth convergence to minimum |
| **Too large** (0.5) | Oscillation, may diverge |
## Try It Yourself
::: {.callout-tip}
## Exercise
1. Open the [ LAB02 notebook ](https://github.com/ngcharithperera/edge-analytics-lab-book/blob/main/notebooks/LAB02_ml_foundations.ipynb) in Colab
2. Modify the learning rate and observe convergence
3. Try different starting points
4. Compare SGD with Adam optimizer
:::
## Key Takeaways
1. **Gradient descent** follows the direction of steepest descent
2. **Learning rate** is crucial: too small = slow, too large = unstable
3. **Local minima** can trap the optimizer (advanced optimizers help)
4. **Momentum** helps escape saddle points and smooth convergence
## Related Sections in PDF Book
- Section 2.3: Optimization Methods
- Section 2.4: Backpropagation
- Exercise 2.1: Implement gradient descent from scratch