Learn optimization techniques from fundamental mathematical concepts to advanced machine learning applications. Designed by IIT faculty for students at all levels.
Derivatives, gradients, Hessians, and convexity concepts explained with visual intuition
Linear programming, gradient descent, Newton's method, and constraint handling
Optimization in neural networks, hyperparameter tuning, and model selection
Python code examples, real datasets, and hands-on programming exercises
In optimization, derivatives tell us the rate of change of a function. For data science, this helps us understand how small changes in our model parameters affect the loss function.
For a function $f(x) = x^2 + 2x + 1$, the derivative is:
This tells us the slope at any point $x$.
import numpy as np
import matplotlib.pyplot as plt
# Define function and its derivative
def f(x):
return x**2 + 2*x + 1
def f_prime(x):
return 2*x + 2
# Calculate gradient at a point
x = 1.0
gradient = f_prime(x)
print(f"Gradient at x={x}: {gradient}")
# Numerical gradient (for verification)
h = 1e-8
numerical_gradient = (f(x + h) - f(x - h)) / (2 * h)
print(f"Numerical gradient: {numerical_gradient}")
Convex functions are the "nice" functions in optimization. They have a single global minimum, making optimization algorithms reliable and predictable.
A function $f$ is convex if for any two points $x_1, x_2$ and any $\lambda \in [0,1]$:
In real-world problems, we often have constraints on our variables. Understanding how to handle these is crucial for practical optimization.
$g(x) = 0$ - The solution must satisfy this exactly
$h(x) \leq 0$ - The solution must be in the feasible region
For constrained optimization, we use the Lagrangian:
The optimal solution satisfies: $\nabla_x L = 0$ and $g(x) = 0$
The workhorse of machine learning optimization
Faster convergence using second-order information
Adaptive learning rates for modern deep learning
Global optimization for non-convex problems
Let's see how optimization works in the simplest machine learning model.
We want to find the best line $y = wx + b$ to fit our data points.
Loss function (Mean Squared Error): $L(w,b) = \frac{1}{n}\sum_{i=1}^n (y_i - wx_i - b)^2$
Understanding how backpropagation uses optimization to train neural networks.
Finding the best hyperparameters is itself an optimization problem!
Exhaustively search over a grid of hyperparameter values
Randomly sample from hyperparameter distributions
Use probabilistic model to guide search
Markowitz portfolio theory: maximizing return while minimizing risk.
Minimize portfolio variance: $\min \frac{1}{2} w^T \Sigma w$
Subject to: $\sum w_i = 1$ (budget constraint) and $\mu^T w \geq r_{target}$ (return constraint)
Find the minimum of $f(x) = x^4 - 4x^3 + 6x^2 - 4x + 1$
Minimize $f(x,y) = x^2 + y^2$ subject to $x + y = 1$
Implement gradient descent for logistic regression
Which algorithm typically converges faster?