Appendix A: Linear Algebra for ML

Essential Concepts for Edge Analytics

This appendix covers the linear algebra fundamentals needed to understand machine learning algorithms in the book.

Prerequisites

Basic familiarity with vectors and matrices. No advanced math required.

Vectors

A vector is an ordered list of numbers:

\[\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\]

Operations

Addition: Element-wise \[\mathbf{x} + \mathbf{y} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}\]

Scalar multiplication: \[c \cdot \mathbf{x} = \begin{bmatrix} c \cdot x_1 \\ c \cdot x_2 \\ \vdots \\ c \cdot x_n \end{bmatrix}\]

Dot product: \[\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n\]

Python Example

Code

import numpy as np

# Create vectors
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(f"x + y = {x + y}")
print(f"2 * x = {2 * x}")
print(f"x · y = {np.dot(x, y)}")

x + y = [5 7 9]
2 * x = [2 4 6]
x · y = 32

Matrices

A matrix is a 2D array of numbers:

\[\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\]

Matrix-Vector Multiplication

For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$:

\[\mathbf{y} = \mathbf{A}\mathbf{x}\]

where $y_i = \sum_{j=1}^{n} a_{ij} x_j$

In neural networks: This is exactly what a fully-connected layer does!

Code

# Matrix-vector multiplication
A = np.array([[1, 2, 3],
              [4, 5, 6]])  # 2x3 matrix

x = np.array([1, 0, -1])   # 3-element vector

y = A @ x  # or np.matmul(A, x)
print(f"A @ x = {y}")  # Result: 2-element vector

A @ x = [-2 -2]

Matrix-Matrix Multiplication

For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$:

\[(\mathbf{AB})_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}\]

Key rule: Inner dimensions must match: $(m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)$

Code

A = np.array([[1, 2],
              [3, 4]])  # 2x2

B = np.array([[5, 6],
              [7, 8]])  # 2x2

C = A @ B
print(f"A @ B =\n{C}")

A @ B =
[[19 22]
 [43 50]]

Why This Matters for ML

Neural Network Layers

A fully-connected layer computes:

\[\mathbf{y} = \sigma(\mathbf{W}\mathbf{x} + \mathbf{b})\]

where: - $\mathbf{x}$: input vector (e.g., 784 pixels for MNIST) - $\mathbf{W}$: weight matrix (learned parameters) - $\mathbf{b}$: bias vector - $\sigma$: activation function (ReLU, sigmoid, etc.)

Code

def dense_layer(x, W, b, activation='relu'):
    """Implement a single dense layer"""
    z = W @ x + b  # Linear transformation

    if activation == 'relu':
        return np.maximum(0, z)
    elif activation == 'sigmoid':
        return 1 / (1 + np.exp(-z))
    else:
        return z

# Example: 4 inputs → 3 outputs
W = np.random.randn(3, 4)  # 3x4 matrix
b = np.zeros(3)            # 3-element bias
x = np.array([1, 2, 3, 4]) # 4-element input

y = dense_layer(x, W, b)
print(f"Output: {y}")

Output: [4.62300668 3.10401404 2.37554527]

Convolutions

Convolutions can also be expressed as matrix operations (though they’re usually implemented more efficiently):

Code

def conv2d_as_matmul(image, kernel):
    """
    Conceptually, convolution is matrix multiplication
    with a specially constructed matrix
    """
    # For simplicity, just showing the concept
    # Actual implementation uses im2col or direct loops
    pass

Gradients and Derivatives

The Gradient

For a function $f(\mathbf{x})$, the gradient is:

\[\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}\]

The gradient points in the direction of steepest ascent.

Chain Rule

For composite functions $f(g(\mathbf{x}))$:

\[\frac{\partial f}{\partial x_i} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x_i}\]

This is the foundation of backpropagation!

Code

def numerical_gradient(f, x, epsilon=1e-5):
    """Compute gradient numerically"""
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy()
        x_plus[i] += epsilon
        x_minus = x.copy()
        x_minus[i] -= epsilon
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * epsilon)
    return grad

# Example: gradient of f(x) = x1^2 + x2^2
def f(x):
    return x[0]**2 + x[1]**2

x = np.array([3.0, 4.0])
grad = numerical_gradient(f, x)
print(f"Gradient at {x}: {grad}")  # Should be [6, 8] = 2*x

Gradient at [3. 4.]: [6. 8.]

Quick Reference

Operation	NumPy	Description
Vector add	`x + y`	Element-wise addition
Dot product	`np.dot(x, y)` or `x @ y`	Scalar result
Matrix multiply	`A @ B` or `np.matmul(A, B)`	Matrix result
Transpose	`A.T`	Flip rows and columns
Element-wise	`A * B`	Hadamard product
Norm	`np.linalg.norm(x)`	Vector length

--- title: "Appendix A: Linear Algebra for ML" --- ## Essential Concepts for Edge Analytics This appendix covers the linear algebra fundamentals needed to understand machine learning algorithms in the book. ::: {.callout-note} ## Prerequisites Basic familiarity with vectors and matrices. No advanced math required. ::: ## Vectors A **vector** is an ordered list of numbers: $$\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$$ ### Operations **Addition:** Element-wise $$\mathbf{x} + \mathbf{y} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}$$ **Scalar multiplication:** $$c \cdot \mathbf{x} = \begin{bmatrix} c \cdot x_1 \\ c \cdot x_2 \\ \vdots \\ c \cdot x_n \end{bmatrix}$$ **Dot product:** $$\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n$$ ### Python Example ```{python} import numpy as np # Create vectors x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) print(f"x + y = {x + y}") print(f"2 * x = {2 * x}") print(f"x · y = {np.dot(x, y)}") ``` ## Matrices A **matrix** is a 2D array of numbers: $$\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}$$ ### Matrix-Vector Multiplication For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$: $$\mathbf{y} = \mathbf{A}\mathbf{x}$$ where $y_i = \sum_{j=1}^{n} a_{ij} x_j$ **In neural networks:** This is exactly what a fully-connected layer does! ```{python} # Matrix-vector multiplication A = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3 matrix x = np.array([1, 0, -1]) # 3-element vector y = A @ x # or np.matmul(A, x) print(f"A @ x = {y}") # Result: 2-element vector ``` ### Matrix-Matrix Multiplication For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$: $$(\mathbf{AB})_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}$$ **Key rule:** Inner dimensions must match: $(m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)$ ```{python} A = np.array([[1, 2], [3, 4]]) # 2x2 B = np.array([[5, 6], [7, 8]]) # 2x2 C = A @ B print(f"A @ B =\n{C}") ``` ## Why This Matters for ML ### Neural Network Layers A fully-connected layer computes: $$\mathbf{y} = \sigma(\mathbf{W}\mathbf{x} + \mathbf{b})$$ where: - $\mathbf{x}$: input vector (e.g., 784 pixels for MNIST) - $\mathbf{W}$: weight matrix (learned parameters) - $\mathbf{b}$: bias vector - $\sigma$: activation function (ReLU, sigmoid, etc.) ```{python} def dense_layer(x, W, b, activation='relu'): """Implement a single dense layer""" z = W @ x + b # Linear transformation if activation == 'relu': return np.maximum(0, z) elif activation == 'sigmoid': return 1 / (1 + np.exp(-z)) else: return z # Example: 4 inputs → 3 outputs W = np.random.randn(3, 4) # 3x4 matrix b = np.zeros(3) # 3-element bias x = np.array([1, 2, 3, 4]) # 4-element input y = dense_layer(x, W, b) print(f"Output: {y}") ``` ### Convolutions Convolutions can also be expressed as matrix operations (though they're usually implemented more efficiently): ```{python} def conv2d_as_matmul(image, kernel): """ Conceptually, convolution is matrix multiplication with a specially constructed matrix """ # For simplicity, just showing the concept # Actual implementation uses im2col or direct loops pass ``` ## Gradients and Derivatives ### The Gradient For a function $f(\mathbf{x})$, the gradient is: $$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}$$ **The gradient points in the direction of steepest ascent.** ### Chain Rule For composite functions $f(g(\mathbf{x}))$: $$\frac{\partial f}{\partial x_i} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x_i}$$ **This is the foundation of backpropagation!** ```{python} def numerical_gradient(f, x, epsilon=1e-5): """Compute gradient numerically""" grad = np.zeros_like(x) for i in range(len(x)): x_plus = x.copy() x_plus[i] += epsilon x_minus = x.copy() x_minus[i] -= epsilon grad[i] = (f(x_plus) - f(x_minus)) / (2 * epsilon) return grad # Example: gradient of f(x) = x1^2 + x2^2 def f(x): return x[0]**2 + x[1]**2 x = np.array([3.0, 4.0]) grad = numerical_gradient(f, x) print(f"Gradient at {x}: {grad}") # Should be [6, 8] = 2*x ``` ## Quick Reference | Operation | NumPy | Description | |-----------|-------|-------------| | Vector add | `x + y` | Element-wise addition | | Dot product | `np.dot(x, y)` or `x @ y` | Scalar result | | Matrix multiply | `A @ B` or `np.matmul(A, B)` | Matrix result | | Transpose | `A.T` | Flip rows and columns | | Element-wise | `A * B` | Hadamard product | | Norm | `np.linalg.norm(x)` | Vector length | ## Further Reading - [3Blue1Brown: Essence of Linear Algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) - Excellent visual explanations - [Khan Academy: Linear Algebra](https://www.khanacademy.org/math/linear-algebra) - Comprehensive course - [Deep Learning Book, Chapter 2](https://www.deeplearningbook.org/contents/linear_algebra.html) - ML-focused