LAB01: Introduction to Edge Analytics

Foundations of TinyML and Edge Computing

PDF Textbook Reference

For detailed theoretical foundations, mathematical proofs, and algorithm derivations, see Chapter 1: Edge Analytics Fundamentals and TinyML in the PDF textbook.

The PDF chapter includes: - Complete derivations of edge computing trade-offs - Detailed hardware architecture diagrams - In-depth coverage of the TinyML ecosystem - Mathematical foundations for memory budgeting - Extended examples and case studies

Open In Colab

Open In Colab

Download Notebook

Learning Objectives

By the end of this lab you will be able to:

  • Explain the difference between cloud ML and edge ML with concrete examples
  • Identify key constraints of edge devices (memory, compute, power, connectivity)
  • Describe the TinyML ecosystem and typical edge ML workflows
  • Understand the three-tier execution model used throughout the course

Theory Summary

What is Edge Analytics?

Edge Analytics brings machine learning intelligence directly to the devices that collect data, rather than sending everything to the cloud. This fundamental shift enables faster response times, better privacy, and offline functionality.

The Cloud vs Edge Dilemma: Traditional cloud-based ML systems send sensor data over the internet for processing. While this works well for many applications, it introduces latency (200-700ms round-trip), privacy concerns (your data leaves the device), connectivity requirements (no internet = no functionality), and ongoing costs for cloud API calls.

Edge ML Solution: By deploying lightweight models directly on edge devices like microcontrollers and single-board computers, we can process data locally in 10-50ms with complete privacy and no cloud dependency. The trade-off is model size and accuracy - edge models must be aggressively optimized to fit in kilobytes of memory instead of gigabytes.

Real-World Impact: Edge ML powers billions of devices today including smartphone wake-word detection (“Hey Siri”), fitness tracker activity recognition, and industrial predictive maintenance sensors. The global TinyML market is projected to reach $15 billion by 2030 as more applications move intelligence to the edge.

Key Concepts at a Glance

Core Concepts
  • Edge Computing: Processing data at or near the source rather than in centralized cloud servers
  • TinyML: Machine learning on microcontrollers with <1MB memory and <1mW power
  • Three-Tier Model: Progressive deployment (Notebook → Simulator → Device)
  • Memory Hierarchy: Flash (persistent model storage) vs SRAM (runtime workspace)
  • Quantization: Reducing model size 4x by converting float32 → int8
  • Latency Budget: Edge ML targets <100ms response time vs 200-1000ms for cloud

Common Pitfalls

Mistakes to Avoid

Assuming Edge Devices Have GB of RAM: The most common misconception is treating edge devices like laptops. An Arduino has 256KB RAM - that’s 65,000× less than a typical laptop! Always check target device specifications before designing your model.

Ignoring Runtime Memory vs Model Size: A 20KB model doesn’t mean you only need 20KB of RAM. The model weights live in Flash memory, but inference requires additional SRAM for input buffers, intermediate activations, and output tensors. A 20KB model might need 50KB+ of runtime RAM.

Forgetting to Normalize Inputs: Neural networks expect inputs in a consistent range (typically 0-1). Feeding raw sensor values (e.g., 0-255 pixel values) without normalization causes training to fail or models to perform poorly. Always include x = x / 255.0 or similar preprocessing.

Skipping Level 2 (Simulator) Testing: Students often jump from notebook training directly to hardware deployment. Use simulators and TFLite interpreters to catch quantization errors, memory issues, and preprocessing bugs before fighting with embedded debuggers.

Quick Reference

Key Formulas

Model Memory Calculation: \[\text{Model Size (bytes)} = \text{parameters} \times \text{bytes per parameter}\]

For float32: 4 bytes/param | For int8: 1 byte/param

Sample Memory Budget (Arduino Nano 33 BLE): \[256\text{KB total SRAM} = \text{model arena} + \text{input buffer} + \text{stack} + \text{overhead}\]

Inference Latency Target: \[\text{Latency} < \frac{1}{\text{sample rate}} \text{ for real-time processing}\]

Important Parameter Values

Device Type Flash SRAM Typical Use
Arduino Nano 33 BLE 1MB 256KB Audio, motion, small models
ESP32 4MB 520KB WiFi + ML, larger models
Raspberry Pi 4 SD card 4-8GB Edge server, vision tasks
Cloud Server Unlimited 16-128GB Training, complex inference

Typical Model Sizes: - Keyword Spotting (2 words): ~20KB - Gesture Recognition: ~50KB - Simple Image Classification: 200-500KB - Object Detection (YOLO-tiny): ~6MB

Self-Assessment Checkpoints

Test your understanding before proceeding to the exercises.

Answer: Edge computing reduces latency by processing data locally (10-50ms vs 200-700ms cloud round-trip), enables offline operation without internet connectivity, improves privacy by keeping sensitive data on-device, and eliminates ongoing cloud API costs. For real-time applications like voice assistants or autonomous vehicles, the cloud round-trip latency is often unacceptable.

Answer: Model Size = parameters × bytes per parameter = 50,000 × 4 bytes = 200,000 bytes = 200 KB. Note that this is just the model weights stored in Flash memory - runtime inference will require additional SRAM for activations, input buffers, and temporary variables (typically 2-3× the model size).

Answer: Use edge ML when you need: (1) Low latency (<100ms), (2) Privacy/security (data cannot leave device), (3) Offline operation (no guaranteed connectivity), or (4) Low ongoing costs (avoid cloud fees). Use cloud ML when you need: (1) High accuracy with large models, (2) Centralized data aggregation, (3) Easy updates/retraining, or (4) Unlimited compute resources. Many real-world systems use hybrid approaches.

Answer: The 80KB model size is the Flash storage size for weights. Runtime inference requires a tensor arena in SRAM for input buffers, intermediate activations, and output tensors. An 80KB model might need 200-300KB of SRAM to execute. Check the tensor arena size - it likely exceeds 256KB. Solutions: reduce model complexity, use quantization, or choose layers with lower activation memory requirements.

Answer: The three-tier model reduces debugging time and risk: (1) Notebook tier validates algorithms and trains models in a familiar environment with full debugging tools, (2) Simulator tier catches quantization errors, memory issues, and preprocessing bugs before hardware deployment, (3) Device tier confirms real-world performance. Jumping directly to hardware makes debugging extremely difficult because you can’t easily inspect intermediate values or use standard debugging tools.

Executable Code Examples

Below are practical Python functions you can run to analyze edge ML constraints.

Memory Requirement Calculator

Code
def calculate_model_memory(num_parameters, dtype='float32'):
    """
    Calculate model memory requirements in KB.

    Args:
        num_parameters: Number of model parameters (weights + biases)
        dtype: Data type ('float32', 'float16', or 'int8')

    Returns:
        Model size in KB
    """
    bytes_per_param = {'float32': 4, 'float16': 2, 'int8': 1}

    if dtype not in bytes_per_param:
        raise ValueError(f"dtype must be one of {list(bytes_per_param.keys())}")

    total_bytes = num_parameters * bytes_per_param[dtype]
    size_kb = total_bytes / 1024

    return size_kb

# Example usage
model_params = 100000
print(f"Model with {model_params:,} parameters:")
print(f"  float32: {calculate_model_memory(model_params, 'float32'):.2f} KB")
print(f"  float16: {calculate_model_memory(model_params, 'float16'):.2f} KB")
print(f"  int8:    {calculate_model_memory(model_params, 'int8'):.2f} KB")
print(f"\nQuantization savings: {4.0}x smaller (float32 → int8)")
Model with 100,000 parameters:
  float32: 390.62 KB
  float16: 195.31 KB
  int8:    97.66 KB

Quantization savings: 4.0x smaller (float32 → int8)

Inference Latency Estimator

Code
def estimate_inference_latency(model_flops, mcu_freq_mhz, efficiency_factor=0.5):
    """
    Estimate model inference latency on a microcontroller.

    Args:
        model_flops: Total floating-point operations for inference
        mcu_freq_mhz: MCU clock frequency in MHz
        efficiency_factor: Efficiency (0-1), accounting for memory access,
                          instruction overhead (default: 0.5)

    Returns:
        Estimated latency in milliseconds
    """
    # Operations per second the MCU can theoretically perform
    ops_per_second = mcu_freq_mhz * 1e6

    # Account for real-world efficiency
    effective_ops_per_second = ops_per_second * efficiency_factor

    # Time in seconds
    latency_seconds = model_flops / effective_ops_per_second

    # Convert to milliseconds
    latency_ms = latency_seconds * 1000

    return latency_ms

# Example: Small CNN on Arduino Nano 33 BLE (64 MHz)
model_flops = 5e6  # 5 million operations
arduino_freq = 64   # MHz

latency = estimate_inference_latency(model_flops, arduino_freq)
print(f"Model with {model_flops/1e6:.1f}M FLOPs on Arduino Nano 33 BLE (64 MHz):")
print(f"  Estimated latency: {latency:.2f} ms")

# Check if suitable for real-time audio (10ms window)
if latency < 10:
    print("  ✓ Fast enough for real-time audio processing")
else:
    print("  ✗ Too slow for real-time audio (need <10ms)")

# Compare with ESP32
esp32_latency = estimate_inference_latency(model_flops, 240)
print(f"\nSame model on ESP32 (240 MHz): {esp32_latency:.2f} ms")
print(f"Speed improvement: {latency/esp32_latency:.1f}x faster")
Model with 5.0M FLOPs on Arduino Nano 33 BLE (64 MHz):
  Estimated latency: 156.25 ms
  ✗ Too slow for real-time audio (need <10ms)

Same model on ESP32 (240 MHz): 41.67 ms
Speed improvement: 3.8x faster

Device Compatibility Checker

Code
def check_device_fit(model_size_kb, device_ram_kb, device_flash_kb,
                     arena_multiplier=2.5):
    """
    Check if a model fits on a target device.

    Args:
        model_size_kb: Model size in KB
        device_ram_kb: Available RAM on device
        device_flash_kb: Available Flash storage
        arena_multiplier: RAM arena = model_size × multiplier (default: 2.5)

    Returns:
        Dictionary with fit analysis
    """
    required_ram = model_size_kb * arena_multiplier

    fits_flash = model_size_kb <= device_flash_kb
    fits_ram = required_ram <= device_ram_kb

    result = {
        'model_size_kb': model_size_kb,
        'required_ram_kb': required_ram,
        'available_ram_kb': device_ram_kb,
        'fits_flash': fits_flash,
        'fits_ram': fits_ram,
        'can_deploy': fits_flash and fits_ram,
        'ram_usage_percent': (required_ram / device_ram_kb) * 100 if fits_ram else None
    }

    return result

# Test with common devices
devices = {
    'Arduino Nano 33 BLE': {'ram': 256, 'flash': 1024},
    'ESP32': {'ram': 520, 'flash': 4096},
    'Raspberry Pi Pico': {'ram': 264, 'flash': 2048},
}

model_size = 98  # 98 KB quantized model (from 100K params × 1 byte)

print("Model Compatibility Analysis")
print("=" * 70)
for device_name, specs in devices.items():
    result = check_device_fit(model_size, specs['ram'], specs['flash'])
    status = "✓ CAN DEPLOY" if result['can_deploy'] else "✗ CANNOT DEPLOY"

    print(f"\n{device_name}: {status}")
    print(f"  Model size: {result['model_size_kb']} KB")
    print(f"  Required RAM: {result['required_ram_kb']:.1f} KB")
    print(f"  Available RAM: {result['available_ram_kb']} KB")
    if result['ram_usage_percent']:
        print(f"  RAM usage: {result['ram_usage_percent']:.1f}%")
Model Compatibility Analysis
======================================================================

Arduino Nano 33 BLE: ✓ CAN DEPLOY
  Model size: 98 KB
  Required RAM: 245.0 KB
  Available RAM: 256 KB
  RAM usage: 95.7%

ESP32: ✓ CAN DEPLOY
  Model size: 98 KB
  Required RAM: 245.0 KB
  Available RAM: 520 KB
  RAM usage: 47.1%

Raspberry Pi Pico: ✓ CAN DEPLOY
  Model size: 98 KB
  Required RAM: 245.0 KB
  Available RAM: 264 KB
  RAM usage: 92.8%

Device Comparison Table

Code
import pandas as pd

def compare_devices_for_model(model_params, dtype='int8'):
    """
    Generate a comparison table showing which devices can run a model.

    Args:
        model_params: Number of model parameters
        dtype: Data type for the model

    Returns:
        pandas DataFrame with device comparison
    """
    # Device specifications
    devices_data = [
        {'Device': 'Arduino Nano 33 BLE', 'CPU MHz': 64, 'RAM KB': 256,
         'Flash KB': 1024, 'Power mW': 20, 'Cost': '$15'},
        {'Device': 'ESP32', 'CPU MHz': 240, 'RAM KB': 520,
         'Flash KB': 4096, 'Power mW': 100, 'Cost': '$10'},
        {'Device': 'Raspberry Pi Pico', 'CPU MHz': 133, 'RAM KB': 264,
         'Flash KB': 2048, 'Power mW': 25, 'Cost': '$4'},
        {'Device': 'Raspberry Pi 4', 'CPU MHz': 1500, 'RAM KB': 4194304,
         'Flash KB': 32768000, 'Power mW': 5000, 'Cost': '$55'},
        {'Device': 'Jetson Nano', 'CPU MHz': 1430, 'RAM KB': 4194304,
         'Flash KB': 16384000, 'Power mW': 10000, 'Cost': '$99'},
    ]

    df = pd.DataFrame(devices_data)

    # Calculate model size
    model_size_kb = calculate_model_memory(model_params, dtype)

    # Calculate required RAM (2.5x model size for tensor arena)
    required_ram_kb = model_size_kb * 2.5

    # Check compatibility
    df['Model Size KB'] = round(model_size_kb, 1)
    df['Required RAM KB'] = round(required_ram_kb, 1)
    df['Flash OK'] = df['Flash KB'] >= model_size_kb
    df['RAM OK'] = df['RAM KB'] >= required_ram_kb
    df['Compatible'] = df['Flash OK'] & df['RAM OK']
    df['RAM Usage %'] = (required_ram_kb / df['RAM KB'] * 100).round(1)

    # Format for display
    df['Compatible'] = df['Compatible'].map({True: '✓', False: '✗'})

    return df

# Example: Compare devices for a 50K parameter model
model_params = 50000
comparison = compare_devices_for_model(model_params, dtype='int8')

print(f"\nDevice Comparison for {model_params:,} parameter model (int8 quantized)")
print("=" * 90)

# Display key columns
display_cols = ['Device', 'CPU MHz', 'RAM KB', 'Model Size KB',
                'Required RAM KB', 'RAM Usage %', 'Compatible']
print(comparison[display_cols].to_string(index=False))

print("\nKey Insights:")
print("- The 50K parameter model is 49 KB when quantized to int8")
print("- Runtime requires ~122 KB RAM (2.5x model size for tensor arena)")
print("- Arduino Nano and Pi Pico can run it, using ~46-50% of their RAM")
print("- Larger devices have plenty of headroom for more complex models")

Device Comparison for 50,000 parameter model (int8 quantized)
==========================================================================================
             Device  CPU MHz  RAM KB  Model Size KB  Required RAM KB  RAM Usage % Compatible
Arduino Nano 33 BLE       64     256           48.8            122.1         47.7          ✓
              ESP32      240     520           48.8            122.1         23.5          ✓
  Raspberry Pi Pico      133     264           48.8            122.1         46.2          ✓
     Raspberry Pi 4     1500 4194304           48.8            122.1          0.0          ✓
        Jetson Nano     1430 4194304           48.8            122.1          0.0          ✓

Key Insights:
- The 50K parameter model is 49 KB when quantized to int8
- Runtime requires ~122 KB RAM (2.5x model size for tensor arena)
- Arduino Nano and Pi Pico can run it, using ~46-50% of their RAM
- Larger devices have plenty of headroom for more complex models

Interactive Notebook

The notebook below contains runnable code for all Level 1 activities.

LAB 1: Introduction to Edge Analytics and TinyML

Open In Colab View on GitHub


Overview

Property Value
Book Chapter Chapter 01: Introduction to Machine Learning on Edge
Execution Levels Level 1 only (Theory and Exploration)
Estimated Time 45-60 minutes
Prerequisites Basic Python knowledge

Learning Objectives

After completing this lab, you will be able to:

  1. Define Edge Analytics and explain its importance in IoT systems
  2. Compare cloud-based ML vs edge-based ML architectures
  3. Identify the constraints of edge devices (memory, compute, power)
  4. Describe the TinyML ecosystem and its key frameworks
  5. Analyze real-world applications of ML on edge devices

What is Edge Analytics?

Edge Analytics refers to the process of collecting, processing, and analyzing data at or near the source of data generation, rather than sending it to a centralized data center or cloud.

In the context of Machine Learning, Edge ML or TinyML means running ML models directly on edge devices like: - Microcontrollers (Arduino, ESP32) - Single-board computers (Raspberry Pi) - Mobile devices - IoT sensors

Section 1: Cloud vs Edge - Architecture Comparison

Traditional Cloud-Based ML

┌─────────┐    ┌─────────┐    ┌─────────────┐    ┌──────────┐
│ Sensor  │───▶│ Gateway │───▶│    Cloud    │───▶│ Response │
│         │    │         │    │ ML Server   │    │          │
└─────────┘    └─────────┘    └─────────────┘    └──────────┘
     │                              │
     │         Network Latency      │
     │◀────────────────────────────▶│
              100ms - 1000ms+

Edge-Based ML

┌─────────────────────────────┐    ┌──────────┐
│  Sensor + Edge ML Device   │───▶│ Response │
│  (All processing on-device)│    │          │
└─────────────────────────────┘    └──────────┘
           │                  │
           │ Local Inference  │
           │◀────────────────▶│
                 1ms - 100ms

Section 2: Why Edge Analytics?

Key Benefits

Benefit Description Example
Low Latency Real-time responses without network delay Autonomous vehicles, gesture control
Privacy Data stays on device, never sent to cloud Voice assistants, health monitoring
Reliability Works without internet connection Industrial sensors, remote monitoring
Bandwidth Reduced data transmission costs Video analytics, continuous monitoring
Power Optimized for battery operation Wearables, wildlife tracking

Trade-offs

Challenge Description Mitigation
Limited Memory KB-MB vs GB on cloud Model compression, quantization
Limited Compute MHz vs GHz processors Efficient architectures
Model Size Must fit in flash memory Pruning, knowledge distillation
Accuracy May be lower than cloud Careful model selection

Section 3: The TinyML Ecosystem

Key Frameworks and Tools

┌─────────────────────────────────────────────────────────────────┐
│                     TRAINING (Cloud/Laptop)                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ TensorFlow  │  │   PyTorch   │  │   Keras     │              │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘              │
│         │                │                │                      │
│         ▼                ▼                ▼                      │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │              Model Optimization & Conversion               │  │
│  │  • Quantization (Float32 → Int8)                          │  │
│  │  • Pruning (Remove unnecessary weights)                    │  │
│  │  • Knowledge Distillation                                  │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   DEPLOYMENT (Edge Device)                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │ TensorFlow Lite │  │ TFLite Micro    │  │    Edge TPU     │  │
│  │ (Mobile/Pi)     │  │ (MCU)           │  │  (Coral)        │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Section 4: Real-World Applications

Application Areas

1. Keyword Spotting (Wake Word Detection)

  • Device: Arduino Nano 33 BLE Sense
  • Model Size: ~18 KB
  • Latency: <20 ms
  • Example: “Hey Google”, “Alexa”

2. Gesture Recognition

  • Device: Arduino with IMU
  • Model Size: ~10 KB
  • Latency: <50 ms
  • Example: Fitness tracking, game controllers

3. Anomaly Detection

  • Device: ESP32
  • Model Size: ~5 KB
  • Latency: <100 ms
  • Example: Predictive maintenance, security

4. Person Detection

  • Device: Raspberry Pi + Camera
  • Model Size: ~300 KB
  • Latency: ~100 ms
  • Example: Smart doorbells, occupancy sensing

5. Object Detection

  • Device: Raspberry Pi + Coral TPU
  • Model Size: ~4 MB
  • Latency: ~30 ms
  • Example: Retail analytics, traffic monitoring

Section 5: The Three-Tier Execution Model

Throughout this course, we use a three-tier approach to edge ML development:

Tier 1: Notebook Simulation

  • Environment: Jupyter notebook on laptop or Google Colab
  • Framework: Full TensorFlow/PyTorch
  • Purpose: Learn concepts, train models, experiment freely
  • Constraints: None (full resources available)

Tier 2: Simulator/Emulator

  • Environment: TFLite interpreter, QEMU, Raspberry Pi
  • Framework: TensorFlow Lite
  • Purpose: Test model under edge-like constraints
  • Constraints: Limited to TFLite operations, quantized models

Tier 3: On-Device Deployment

  • Environment: Arduino, ESP32, Raspberry Pi
  • Framework: TensorFlow Lite Micro
  • Purpose: Real deployment with actual hardware
  • Constraints: Full edge constraints (memory, compute, power)

Section 6: Course Roadmap

This lab book is organized into the following parts:

Part 1: Foundations (LAB 1-3)

  • LAB 1: Introduction to Edge Analytics (this lab)
  • LAB 2: Machine Learning Fundamentals
  • LAB 3: TensorFlow Lite and Quantization

Part 2: Audio Applications (LAB 4-5)

  • LAB 4: Keyword Spotting / Wake Word Detection
  • LAB 5: Deployment to Microcontrollers

Part 3: Sensor Applications (LAB 8)

  • LAB 8: Sensor Data Processing on Edge

Part 4: Streaming and Distributed (LAB 12-13)

  • LAB 12: Stream Analytics on Raspberry Pi
  • LAB 13: Distributed Query Processing

Part 5: Computer Vision (LAB 16)

  • LAB 16: Real-time Computer Vision with YOLO

Part 6: Advanced Topics (LAB 17)

  • LAB 17: Federated Learning with Flower

Checkpoint: Self-Assessment

Before proceeding to LAB 2, make sure you can answer these questions:


Further Reading

Books

Online Resources

Video Courses


Notebook last updated: December 2025

Part of the Edge Analytics Lab Book

Three-Tier Activities

Run the embedded notebook above. Key exercises:

  1. Follow along with the code cells
  2. Modify parameters and observe results
  3. Complete the checkpoint questions

This introductory lab is primarily conceptual. There is no dedicated Level 2 simulation here; instead:

You do not need hardware or MCU simulators yet; they will be used starting from LAB08 and LAB11.

This lab does not require deployment to real devices. Hands-on MCU and Raspberry Pi deployment begins in:

  • LAB05 – Embedded deployment with TensorFlow Lite Micro
  • LAB08 – Arduino sensors and actuators
  • LAB12 – Streaming pipelines on Raspberry Pi

If you already have an Arduino Nano 33 BLE Sense or Raspberry Pi, you can read ahead in those labs to see how the three-tier model applies in practice.