Appendix A: Linear Algebra for ML

Essential Concepts for Edge Analytics

This appendix covers the linear algebra fundamentals needed to understand machine learning algorithms in the book.

Prerequisites

Basic familiarity with vectors and matrices. No advanced math required.

Vectors

A vector is an ordered list of numbers:

\[\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\]

Operations

Addition: Element-wise \[\mathbf{x} + \mathbf{y} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}\]

Scalar multiplication: \[c \cdot \mathbf{x} = \begin{bmatrix} c \cdot x_1 \\ c \cdot x_2 \\ \vdots \\ c \cdot x_n \end{bmatrix}\]

Dot product: \[\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n\]

Python Example

Code
import numpy as np

# Create vectors
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(f"x + y = {x + y}")
print(f"2 * x = {2 * x}")
print(f"x · y = {np.dot(x, y)}")
x + y = [5 7 9]
2 * x = [2 4 6]
x · y = 32

Matrices

A matrix is a 2D array of numbers:

\[\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\]

Matrix-Vector Multiplication

For \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{x} \in \mathbb{R}^n\):

\[\mathbf{y} = \mathbf{A}\mathbf{x}\]

where \(y_i = \sum_{j=1}^{n} a_{ij} x_j\)

In neural networks: This is exactly what a fully-connected layer does!

Code
# Matrix-vector multiplication
A = np.array([[1, 2, 3],
              [4, 5, 6]])  # 2x3 matrix

x = np.array([1, 0, -1])   # 3-element vector

y = A @ x  # or np.matmul(A, x)
print(f"A @ x = {y}")  # Result: 2-element vector
A @ x = [-2 -2]

Matrix-Matrix Multiplication

For \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{B} \in \mathbb{R}^{n \times p}\):

\[(\mathbf{AB})_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}\]

Key rule: Inner dimensions must match: \((m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)\)

Code
A = np.array([[1, 2],
              [3, 4]])  # 2x2

B = np.array([[5, 6],
              [7, 8]])  # 2x2

C = A @ B
print(f"A @ B =\n{C}")
A @ B =
[[19 22]
 [43 50]]

Why This Matters for ML

Neural Network Layers

A fully-connected layer computes:

\[\mathbf{y} = \sigma(\mathbf{W}\mathbf{x} + \mathbf{b})\]

where: - \(\mathbf{x}\): input vector (e.g., 784 pixels for MNIST) - \(\mathbf{W}\): weight matrix (learned parameters) - \(\mathbf{b}\): bias vector - \(\sigma\): activation function (ReLU, sigmoid, etc.)

Code
def dense_layer(x, W, b, activation='relu'):
    """Implement a single dense layer"""
    z = W @ x + b  # Linear transformation

    if activation == 'relu':
        return np.maximum(0, z)
    elif activation == 'sigmoid':
        return 1 / (1 + np.exp(-z))
    else:
        return z

# Example: 4 inputs → 3 outputs
W = np.random.randn(3, 4)  # 3x4 matrix
b = np.zeros(3)            # 3-element bias
x = np.array([1, 2, 3, 4]) # 4-element input

y = dense_layer(x, W, b)
print(f"Output: {y}")
Output: [4.62300668 3.10401404 2.37554527]

Convolutions

Convolutions can also be expressed as matrix operations (though they’re usually implemented more efficiently):

Code
def conv2d_as_matmul(image, kernel):
    """
    Conceptually, convolution is matrix multiplication
    with a specially constructed matrix
    """
    # For simplicity, just showing the concept
    # Actual implementation uses im2col or direct loops
    pass

Gradients and Derivatives

The Gradient

For a function \(f(\mathbf{x})\), the gradient is:

\[\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}\]

The gradient points in the direction of steepest ascent.

Chain Rule

For composite functions \(f(g(\mathbf{x}))\):

\[\frac{\partial f}{\partial x_i} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x_i}\]

This is the foundation of backpropagation!

Code
def numerical_gradient(f, x, epsilon=1e-5):
    """Compute gradient numerically"""
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy()
        x_plus[i] += epsilon
        x_minus = x.copy()
        x_minus[i] -= epsilon
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * epsilon)
    return grad

# Example: gradient of f(x) = x1^2 + x2^2
def f(x):
    return x[0]**2 + x[1]**2

x = np.array([3.0, 4.0])
grad = numerical_gradient(f, x)
print(f"Gradient at {x}: {grad}")  # Should be [6, 8] = 2*x
Gradient at [3. 4.]: [6. 8.]

Quick Reference

Operation NumPy Description
Vector add x + y Element-wise addition
Dot product np.dot(x, y) or x @ y Scalar result
Matrix multiply A @ B or np.matmul(A, B) Matrix result
Transpose A.T Flip rows and columns
Element-wise A * B Hadamard product
Norm np.linalg.norm(x) Vector length

Further Reading