Code
import numpy as np
# Create vectors
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(f"x + y = {x + y}")
print(f"2 * x = {2 * x}")
print(f"x · y = {np.dot(x, y)}")x + y = [5 7 9]
2 * x = [2 4 6]
x · y = 32
This appendix covers the linear algebra fundamentals needed to understand machine learning algorithms in the book.
Basic familiarity with vectors and matrices. No advanced math required.
A vector is an ordered list of numbers:
\[\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\]
Addition: Element-wise \[\mathbf{x} + \mathbf{y} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}\]
Scalar multiplication: \[c \cdot \mathbf{x} = \begin{bmatrix} c \cdot x_1 \\ c \cdot x_2 \\ \vdots \\ c \cdot x_n \end{bmatrix}\]
Dot product: \[\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n\]
import numpy as np
# Create vectors
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(f"x + y = {x + y}")
print(f"2 * x = {2 * x}")
print(f"x · y = {np.dot(x, y)}")x + y = [5 7 9]
2 * x = [2 4 6]
x · y = 32
A matrix is a 2D array of numbers:
\[\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\]
For \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{x} \in \mathbb{R}^n\):
\[\mathbf{y} = \mathbf{A}\mathbf{x}\]
where \(y_i = \sum_{j=1}^{n} a_{ij} x_j\)
In neural networks: This is exactly what a fully-connected layer does!
# Matrix-vector multiplication
A = np.array([[1, 2, 3],
[4, 5, 6]]) # 2x3 matrix
x = np.array([1, 0, -1]) # 3-element vector
y = A @ x # or np.matmul(A, x)
print(f"A @ x = {y}") # Result: 2-element vectorA @ x = [-2 -2]
For \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{B} \in \mathbb{R}^{n \times p}\):
\[(\mathbf{AB})_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}\]
Key rule: Inner dimensions must match: \((m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)\)
A = np.array([[1, 2],
[3, 4]]) # 2x2
B = np.array([[5, 6],
[7, 8]]) # 2x2
C = A @ B
print(f"A @ B =\n{C}")A @ B =
[[19 22]
[43 50]]
A fully-connected layer computes:
\[\mathbf{y} = \sigma(\mathbf{W}\mathbf{x} + \mathbf{b})\]
where: - \(\mathbf{x}\): input vector (e.g., 784 pixels for MNIST) - \(\mathbf{W}\): weight matrix (learned parameters) - \(\mathbf{b}\): bias vector - \(\sigma\): activation function (ReLU, sigmoid, etc.)
def dense_layer(x, W, b, activation='relu'):
"""Implement a single dense layer"""
z = W @ x + b # Linear transformation
if activation == 'relu':
return np.maximum(0, z)
elif activation == 'sigmoid':
return 1 / (1 + np.exp(-z))
else:
return z
# Example: 4 inputs → 3 outputs
W = np.random.randn(3, 4) # 3x4 matrix
b = np.zeros(3) # 3-element bias
x = np.array([1, 2, 3, 4]) # 4-element input
y = dense_layer(x, W, b)
print(f"Output: {y}")Output: [4.62300668 3.10401404 2.37554527]
Convolutions can also be expressed as matrix operations (though they’re usually implemented more efficiently):
def conv2d_as_matmul(image, kernel):
"""
Conceptually, convolution is matrix multiplication
with a specially constructed matrix
"""
# For simplicity, just showing the concept
# Actual implementation uses im2col or direct loops
passFor a function \(f(\mathbf{x})\), the gradient is:
\[\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}\]
The gradient points in the direction of steepest ascent.
For composite functions \(f(g(\mathbf{x}))\):
\[\frac{\partial f}{\partial x_i} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x_i}\]
This is the foundation of backpropagation!
def numerical_gradient(f, x, epsilon=1e-5):
"""Compute gradient numerically"""
grad = np.zeros_like(x)
for i in range(len(x)):
x_plus = x.copy()
x_plus[i] += epsilon
x_minus = x.copy()
x_minus[i] -= epsilon
grad[i] = (f(x_plus) - f(x_minus)) / (2 * epsilon)
return grad
# Example: gradient of f(x) = x1^2 + x2^2
def f(x):
return x[0]**2 + x[1]**2
x = np.array([3.0, 4.0])
grad = numerical_gradient(f, x)
print(f"Gradient at {x}: {grad}") # Should be [6, 8] = 2*xGradient at [3. 4.]: [6. 8.]
| Operation | NumPy | Description |
|---|---|---|
| Vector add | x + y |
Element-wise addition |
| Dot product | np.dot(x, y) or x @ y |
Scalar result |
| Matrix multiply | A @ B or np.matmul(A, B) |
Matrix result |
| Transpose | A.T |
Flip rows and columns |
| Element-wise | A * B |
Hadamard product |
| Norm | np.linalg.norm(x) |
Vector length |