For detailed theoretical foundations, mathematical proofs, and algorithm derivations, see Chapter 1: Edge Analytics Fundamentals and TinyML in the PDF textbook.
The PDF chapter includes: - Complete derivations of edge computing trade-offs - Detailed hardware architecture diagrams - In-depth coverage of the TinyML ecosystem - Mathematical foundations for memory budgeting - Extended examples and case studies
Explain the difference between cloud ML and edge ML with concrete examples
Identify key constraints of edge devices (memory, compute, power, connectivity)
Describe the TinyML ecosystem and typical edge ML workflows
Understand the three-tier execution model used throughout the course
Theory Summary
What is Edge Analytics?
Edge Analytics brings machine learning intelligence directly to the devices that collect data, rather than sending everything to the cloud. This fundamental shift enables faster response times, better privacy, and offline functionality.
The Cloud vs Edge Dilemma: Traditional cloud-based ML systems send sensor data over the internet for processing. While this works well for many applications, it introduces latency (200-700ms round-trip), privacy concerns (your data leaves the device), connectivity requirements (no internet = no functionality), and ongoing costs for cloud API calls.
Edge ML Solution: By deploying lightweight models directly on edge devices like microcontrollers and single-board computers, we can process data locally in 10-50ms with complete privacy and no cloud dependency. The trade-off is model size and accuracy - edge models must be aggressively optimized to fit in kilobytes of memory instead of gigabytes.
Real-World Impact: Edge ML powers billions of devices today including smartphone wake-word detection (“Hey Siri”), fitness tracker activity recognition, and industrial predictive maintenance sensors. The global TinyML market is projected to reach $15 billion by 2030 as more applications move intelligence to the edge.
Key Concepts at a Glance
Core Concepts
Edge Computing: Processing data at or near the source rather than in centralized cloud servers
TinyML: Machine learning on microcontrollers with <1MB memory and <1mW power
Memory Hierarchy: Flash (persistent model storage) vs SRAM (runtime workspace)
Quantization: Reducing model size 4x by converting float32 → int8
Latency Budget: Edge ML targets <100ms response time vs 200-1000ms for cloud
Common Pitfalls
Mistakes to Avoid
Assuming Edge Devices Have GB of RAM: The most common misconception is treating edge devices like laptops. An Arduino has 256KB RAM - that’s 65,000× less than a typical laptop! Always check target device specifications before designing your model.
Ignoring Runtime Memory vs Model Size: A 20KB model doesn’t mean you only need 20KB of RAM. The model weights live in Flash memory, but inference requires additional SRAM for input buffers, intermediate activations, and output tensors. A 20KB model might need 50KB+ of runtime RAM.
Forgetting to Normalize Inputs: Neural networks expect inputs in a consistent range (typically 0-1). Feeding raw sensor values (e.g., 0-255 pixel values) without normalization causes training to fail or models to perform poorly. Always include x = x / 255.0 or similar preprocessing.
Skipping Level 2 (Simulator) Testing: Students often jump from notebook training directly to hardware deployment. Use simulators and TFLite interpreters to catch quantization errors, memory issues, and preprocessing bugs before fighting with embedded debuggers.
Quick Reference
Key Formulas
Model Memory Calculation:\[\text{Model Size (bytes)} = \text{parameters} \times \text{bytes per parameter}\]
For float32: 4 bytes/param | For int8: 1 byte/param
For deeper understanding, see these sections in Chapter 1 PDF:
Section 1.1: What is Edge Analytics? (pages 2-4)
Section 1.3: Edge Device Constraints (pages 8-10)
Section 1.4: The TinyML Ecosystem (pages 11-13)
Section 1.5: Three-Tier Execution Model (pages 14-16)
Exercises: Hands-on practice problems (pages 18-19)
Self-Assessment Checkpoints
Test your understanding before proceeding to the exercises.
Question 1: What is the main advantage of edge computing over cloud-based ML?
Answer: Edge computing reduces latency by processing data locally (10-50ms vs 200-700ms cloud round-trip), enables offline operation without internet connectivity, improves privacy by keeping sensitive data on-device, and eliminates ongoing cloud API costs. For real-time applications like voice assistants or autonomous vehicles, the cloud round-trip latency is often unacceptable.
Question 2: Calculate the total model size for a neural network with 50,000 parameters stored as float32.
Answer: Model Size = parameters × bytes per parameter = 50,000 × 4 bytes = 200,000 bytes = 200 KB. Note that this is just the model weights stored in Flash memory - runtime inference will require additional SRAM for activations, input buffers, and temporary variables (typically 2-3× the model size).
Question 3: When should you use edge ML versus cloud ML?
Answer: Use edge ML when you need: (1) Low latency (<100ms), (2) Privacy/security (data cannot leave device), (3) Offline operation (no guaranteed connectivity), or (4) Low ongoing costs (avoid cloud fees). Use cloud ML when you need: (1) High accuracy with large models, (2) Centralized data aggregation, (3) Easy updates/retraining, or (4) Unlimited compute resources. Many real-world systems use hybrid approaches.
Question 4: Your Arduino Nano 33 BLE has 256KB RAM but your model won’t run. The model file is only 80KB. What’s likely wrong?
Answer: The 80KB model size is the Flash storage size for weights. Runtime inference requires a tensor arena in SRAM for input buffers, intermediate activations, and output tensors. An 80KB model might need 200-300KB of SRAM to execute. Check the tensor arena size - it likely exceeds 256KB. Solutions: reduce model complexity, use quantization, or choose layers with lower activation memory requirements.
Question 5: Why does the course use a three-tier execution model (Notebook → Simulator → Device)?
Answer: The three-tier model reduces debugging time and risk: (1) Notebook tier validates algorithms and trains models in a familiar environment with full debugging tools, (2) Simulator tier catches quantization errors, memory issues, and preprocessing bugs before hardware deployment, (3) Device tier confirms real-world performance. Jumping directly to hardware makes debugging extremely difficult because you can’t easily inspect intermediate values or use standard debugging tools.
Executable Code Examples
Below are practical Python functions you can run to analyze edge ML constraints.
Memory Requirement Calculator
Code
def calculate_model_memory(num_parameters, dtype='float32'):""" Calculate model memory requirements in KB. Args: num_parameters: Number of model parameters (weights + biases) dtype: Data type ('float32', 'float16', or 'int8') Returns: Model size in KB """ bytes_per_param = {'float32': 4, 'float16': 2, 'int8': 1}if dtype notin bytes_per_param:raiseValueError(f"dtype must be one of {list(bytes_per_param.keys())}") total_bytes = num_parameters * bytes_per_param[dtype] size_kb = total_bytes /1024return size_kb# Example usagemodel_params =100000print(f"Model with {model_params:,} parameters:")print(f" float32: {calculate_model_memory(model_params, 'float32'):.2f} KB")print(f" float16: {calculate_model_memory(model_params, 'float16'):.2f} KB")print(f" int8: {calculate_model_memory(model_params, 'int8'):.2f} KB")print(f"\nQuantization savings: {4.0}x smaller (float32 → int8)")
Model with 100,000 parameters:
float32: 390.62 KB
float16: 195.31 KB
int8: 97.66 KB
Quantization savings: 4.0x smaller (float32 → int8)
Inference Latency Estimator
Code
def estimate_inference_latency(model_flops, mcu_freq_mhz, efficiency_factor=0.5):""" Estimate model inference latency on a microcontroller. Args: model_flops: Total floating-point operations for inference mcu_freq_mhz: MCU clock frequency in MHz efficiency_factor: Efficiency (0-1), accounting for memory access, instruction overhead (default: 0.5) Returns: Estimated latency in milliseconds """# Operations per second the MCU can theoretically perform ops_per_second = mcu_freq_mhz *1e6# Account for real-world efficiency effective_ops_per_second = ops_per_second * efficiency_factor# Time in seconds latency_seconds = model_flops / effective_ops_per_second# Convert to milliseconds latency_ms = latency_seconds *1000return latency_ms# Example: Small CNN on Arduino Nano 33 BLE (64 MHz)model_flops =5e6# 5 million operationsarduino_freq =64# MHzlatency = estimate_inference_latency(model_flops, arduino_freq)print(f"Model with {model_flops/1e6:.1f}M FLOPs on Arduino Nano 33 BLE (64 MHz):")print(f" Estimated latency: {latency:.2f} ms")# Check if suitable for real-time audio (10ms window)if latency <10:print(" ✓ Fast enough for real-time audio processing")else:print(" ✗ Too slow for real-time audio (need <10ms)")# Compare with ESP32esp32_latency = estimate_inference_latency(model_flops, 240)print(f"\nSame model on ESP32 (240 MHz): {esp32_latency:.2f} ms")print(f"Speed improvement: {latency/esp32_latency:.1f}x faster")
Model with 5.0M FLOPs on Arduino Nano 33 BLE (64 MHz):
Estimated latency: 156.25 ms
✗ Too slow for real-time audio (need <10ms)
Same model on ESP32 (240 MHz): 41.67 ms
Speed improvement: 3.8x faster
Device Compatibility Checker
Code
def check_device_fit(model_size_kb, device_ram_kb, device_flash_kb, arena_multiplier=2.5):""" Check if a model fits on a target device. Args: model_size_kb: Model size in KB device_ram_kb: Available RAM on device device_flash_kb: Available Flash storage arena_multiplier: RAM arena = model_size × multiplier (default: 2.5) Returns: Dictionary with fit analysis """ required_ram = model_size_kb * arena_multiplier fits_flash = model_size_kb <= device_flash_kb fits_ram = required_ram <= device_ram_kb result = {'model_size_kb': model_size_kb,'required_ram_kb': required_ram,'available_ram_kb': device_ram_kb,'fits_flash': fits_flash,'fits_ram': fits_ram,'can_deploy': fits_flash and fits_ram,'ram_usage_percent': (required_ram / device_ram_kb) *100if fits_ram elseNone }return result# Test with common devicesdevices = {'Arduino Nano 33 BLE': {'ram': 256, 'flash': 1024},'ESP32': {'ram': 520, 'flash': 4096},'Raspberry Pi Pico': {'ram': 264, 'flash': 2048},}model_size =98# 98 KB quantized model (from 100K params × 1 byte)print("Model Compatibility Analysis")print("="*70)for device_name, specs in devices.items(): result = check_device_fit(model_size, specs['ram'], specs['flash']) status ="✓ CAN DEPLOY"if result['can_deploy'] else"✗ CANNOT DEPLOY"print(f"\n{device_name}: {status}")print(f" Model size: {result['model_size_kb']} KB")print(f" Required RAM: {result['required_ram_kb']:.1f} KB")print(f" Available RAM: {result['available_ram_kb']} KB")if result['ram_usage_percent']:print(f" RAM usage: {result['ram_usage_percent']:.1f}%")
Model Compatibility Analysis
======================================================================
Arduino Nano 33 BLE: ✓ CAN DEPLOY
Model size: 98 KB
Required RAM: 245.0 KB
Available RAM: 256 KB
RAM usage: 95.7%
ESP32: ✓ CAN DEPLOY
Model size: 98 KB
Required RAM: 245.0 KB
Available RAM: 520 KB
RAM usage: 47.1%
Raspberry Pi Pico: ✓ CAN DEPLOY
Model size: 98 KB
Required RAM: 245.0 KB
Available RAM: 264 KB
RAM usage: 92.8%
Device Comparison Table
Code
import pandas as pddef compare_devices_for_model(model_params, dtype='int8'):""" Generate a comparison table showing which devices can run a model. Args: model_params: Number of model parameters dtype: Data type for the model Returns: pandas DataFrame with device comparison """# Device specifications devices_data = [ {'Device': 'Arduino Nano 33 BLE', 'CPU MHz': 64, 'RAM KB': 256,'Flash KB': 1024, 'Power mW': 20, 'Cost': '$15'}, {'Device': 'ESP32', 'CPU MHz': 240, 'RAM KB': 520,'Flash KB': 4096, 'Power mW': 100, 'Cost': '$10'}, {'Device': 'Raspberry Pi Pico', 'CPU MHz': 133, 'RAM KB': 264,'Flash KB': 2048, 'Power mW': 25, 'Cost': '$4'}, {'Device': 'Raspberry Pi 4', 'CPU MHz': 1500, 'RAM KB': 4194304,'Flash KB': 32768000, 'Power mW': 5000, 'Cost': '$55'}, {'Device': 'Jetson Nano', 'CPU MHz': 1430, 'RAM KB': 4194304,'Flash KB': 16384000, 'Power mW': 10000, 'Cost': '$99'}, ] df = pd.DataFrame(devices_data)# Calculate model size model_size_kb = calculate_model_memory(model_params, dtype)# Calculate required RAM (2.5x model size for tensor arena) required_ram_kb = model_size_kb *2.5# Check compatibility df['Model Size KB'] =round(model_size_kb, 1) df['Required RAM KB'] =round(required_ram_kb, 1) df['Flash OK'] = df['Flash KB'] >= model_size_kb df['RAM OK'] = df['RAM KB'] >= required_ram_kb df['Compatible'] = df['Flash OK'] & df['RAM OK'] df['RAM Usage %'] = (required_ram_kb / df['RAM KB'] *100).round(1)# Format for display df['Compatible'] = df['Compatible'].map({True: '✓', False: '✗'})return df# Example: Compare devices for a 50K parameter modelmodel_params =50000comparison = compare_devices_for_model(model_params, dtype='int8')print(f"\nDevice Comparison for {model_params:,} parameter model (int8 quantized)")print("="*90)# Display key columnsdisplay_cols = ['Device', 'CPU MHz', 'RAM KB', 'Model Size KB','Required RAM KB', 'RAM Usage %', 'Compatible']print(comparison[display_cols].to_string(index=False))print("\nKey Insights:")print("- The 50K parameter model is 49 KB when quantized to int8")print("- Runtime requires ~122 KB RAM (2.5x model size for tensor arena)")print("- Arduino Nano and Pi Pico can run it, using ~46-50% of their RAM")print("- Larger devices have plenty of headroom for more complex models")
Device Comparison for 50,000 parameter model (int8 quantized)
==========================================================================================
Device CPU MHz RAM KB Model Size KB Required RAM KB RAM Usage % Compatible
Arduino Nano 33 BLE 64 256 48.8 122.1 47.7 ✓
ESP32 240 520 48.8 122.1 23.5 ✓
Raspberry Pi Pico 133 264 48.8 122.1 46.2 ✓
Raspberry Pi 4 1500 4194304 48.8 122.1 0.0 ✓
Jetson Nano 1430 4194304 48.8 122.1 0.0 ✓
Key Insights:
- The 50K parameter model is 49 KB when quantized to int8
- Runtime requires ~122 KB RAM (2.5x model size for tensor arena)
- Arduino Nano and Pi Pico can run it, using ~46-50% of their RAM
- Larger devices have plenty of headroom for more complex models
Interactive Notebook
The notebook below contains runnable code for all Level 1 activities.
LAB 1: Introduction to Edge Analytics and TinyML
Overview
Property
Value
Book Chapter
Chapter 01: Introduction to Machine Learning on Edge
Execution Levels
Level 1 only (Theory and Exploration)
Estimated Time
45-60 minutes
Prerequisites
Basic Python knowledge
Learning Objectives
After completing this lab, you will be able to:
Define Edge Analytics and explain its importance in IoT systems
Compare cloud-based ML vs edge-based ML architectures
Identify the constraints of edge devices (memory, compute, power)
Describe the TinyML ecosystem and its key frameworks
Analyze real-world applications of ML on edge devices
What is Edge Analytics?
Edge Analytics refers to the process of collecting, processing, and analyzing data at or near the source of data generation, rather than sending it to a centralized data center or cloud.
In the context of Machine Learning, Edge ML or TinyML means running ML models directly on edge devices like: - Microcontrollers (Arduino, ESP32) - Single-board computers (Raspberry Pi) - Mobile devices - IoT sensors
Section 1: Cloud vs Edge - Architecture Comparison
You do not need hardware or MCU simulators yet; they will be used starting from LAB08 and LAB11.
This lab does not require deployment to real devices. Hands-on MCU and Raspberry Pi deployment begins in:
LAB05 – Embedded deployment with TensorFlow Lite Micro
LAB08 – Arduino sensors and actuators
LAB12 – Streaming pipelines on Raspberry Pi
If you already have an Arduino Nano 33 BLE Sense or Raspberry Pi, you can read ahead in those labs to see how the three-tier model applies in practice.
Related Labs
Foundations Track
LAB02: ML Foundations - Build on edge computing concepts with neural network basics
LAB03: Quantization - Learn how to optimize models for edge deployment
Getting Started with Hardware
LAB05: Edge Deployment - Deploy your first model to a microcontroller
LAB08: Arduino Sensors - Start working with physical sensors
---title: "LAB01: Introduction to Edge Analytics"subtitle: "Foundations of TinyML and Edge Computing"---::: {.callout-note}## PDF Textbook ReferenceFor detailed theoretical foundations, mathematical proofs, and algorithm derivations, see **Chapter 1: Edge Analytics Fundamentals and TinyML** in the [PDF textbook](../downloads/Edge-Analytics-Lab-Book-v1.0.0.pdf).The PDF chapter includes:- Complete derivations of edge computing trade-offs- Detailed hardware architecture diagrams- In-depth coverage of the TinyML ecosystem- Mathematical foundations for memory budgeting- Extended examples and case studies:::[](https://colab.research.google.com/github/ngcharithperera/edge-analytics-lab-book/blob/main/notebooks/LAB01_introduction.ipynb)[Download Notebook](https://raw.githubusercontent.com/ngcharithperera/edge-analytics-lab-book/main/notebooks/LAB01_introduction.ipynb)## Learning ObjectivesBy the end of this lab you will be able to:- Explain the difference between cloud ML and edge ML with concrete examples- Identify key constraints of edge devices (memory, compute, power, connectivity)- Describe the TinyML ecosystem and typical edge ML workflows- Understand the three-tier execution model used throughout the course## Theory Summary### What is Edge Analytics?Edge Analytics brings machine learning intelligence directly to the devices that collect data, rather than sending everything to the cloud. This fundamental shift enables faster response times, better privacy, and offline functionality.**The Cloud vs Edge Dilemma:** Traditional cloud-based ML systems send sensor data over the internet for processing. While this works well for many applications, it introduces latency (200-700ms round-trip), privacy concerns (your data leaves the device), connectivity requirements (no internet = no functionality), and ongoing costs for cloud API calls.**Edge ML Solution:** By deploying lightweight models directly on edge devices like microcontrollers and single-board computers, we can process data locally in 10-50ms with complete privacy and no cloud dependency. The trade-off is model size and accuracy - edge models must be aggressively optimized to fit in kilobytes of memory instead of gigabytes.**Real-World Impact:** Edge ML powers billions of devices today including smartphone wake-word detection ("Hey Siri"), fitness tracker activity recognition, and industrial predictive maintenance sensors. The global TinyML market is projected to reach $15 billion by 2030 as more applications move intelligence to the edge.## Key Concepts at a Glance::: {.callout-note icon=false}## Core Concepts- **Edge Computing**: Processing data at or near the source rather than in centralized cloud servers- **TinyML**: Machine learning on microcontrollers with <1MB memory and <1mW power- **Three-Tier Model**: Progressive deployment (Notebook → Simulator → Device)- **Memory Hierarchy**: Flash (persistent model storage) vs SRAM (runtime workspace)- **Quantization**: Reducing model size 4x by converting float32 → int8- **Latency Budget**: Edge ML targets <100ms response time vs 200-1000ms for cloud:::## Common Pitfalls::: {.callout-warning}## Mistakes to Avoid**Assuming Edge Devices Have GB of RAM**: The most common misconception is treating edge devices like laptops. An Arduino has 256KB RAM - that's 65,000× less than a typical laptop! Always check target device specifications before designing your model.**Ignoring Runtime Memory vs Model Size**: A 20KB model doesn't mean you only need 20KB of RAM. The model weights live in Flash memory, but inference requires additional SRAM for input buffers, intermediate activations, and output tensors. A 20KB model might need 50KB+ of runtime RAM.**Forgetting to Normalize Inputs**: Neural networks expect inputs in a consistent range (typically 0-1). Feeding raw sensor values (e.g., 0-255 pixel values) without normalization causes training to fail or models to perform poorly. Always include `x = x / 255.0` or similar preprocessing.**Skipping Level 2 (Simulator) Testing**: Students often jump from notebook training directly to hardware deployment. Use simulators and TFLite interpreters to catch quantization errors, memory issues, and preprocessing bugs before fighting with embedded debuggers.:::## Quick Reference### Key Formulas**Model Memory Calculation:**$$\text{Model Size (bytes)} = \text{parameters} \times \text{bytes per parameter}$$For float32: 4 bytes/param | For int8: 1 byte/param**Sample Memory Budget (Arduino Nano 33 BLE):**$$256\text{KB total SRAM} = \text{model arena} + \text{input buffer} + \text{stack} + \text{overhead}$$**Inference Latency Target:**$$\text{Latency} < \frac{1}{\text{sample rate}} \text{ for real-time processing}$$### Important Parameter Values| Device Type | Flash | SRAM | Typical Use ||------------|-------|------|-------------|| Arduino Nano 33 BLE | 1MB | 256KB | Audio, motion, small models || ESP32 | 4MB | 520KB | WiFi + ML, larger models || Raspberry Pi 4 | SD card | 4-8GB | Edge server, vision tasks || Cloud Server | Unlimited | 16-128GB | Training, complex inference |**Typical Model Sizes:**- Keyword Spotting (2 words): ~20KB- Gesture Recognition: ~50KB- Simple Image Classification: 200-500KB- Object Detection (YOLO-tiny): ~6MB### Links to PDF SectionsFor deeper understanding, see these sections in [Chapter 1 PDF](../downloads/Edge-Analytics-Lab-Book-v1.0.0.pdf#page=1):- **Section 1.1**: What is Edge Analytics? (pages 2-4)- **Section 1.3**: Edge Device Constraints (pages 8-10)- **Section 1.4**: The TinyML Ecosystem (pages 11-13)- **Section 1.5**: Three-Tier Execution Model (pages 14-16)- **Exercises**: Hands-on practice problems (pages 18-19)## Self-Assessment CheckpointsTest your understanding before proceeding to the exercises.::: {.callout-note collapse="true" title="Question 1: What is the main advantage of edge computing over cloud-based ML?"}**Answer:** Edge computing reduces latency by processing data locally (10-50ms vs 200-700ms cloud round-trip), enables offline operation without internet connectivity, improves privacy by keeping sensitive data on-device, and eliminates ongoing cloud API costs. For real-time applications like voice assistants or autonomous vehicles, the cloud round-trip latency is often unacceptable.:::::: {.callout-note collapse="true" title="Question 2: Calculate the total model size for a neural network with 50,000 parameters stored as float32."}**Answer:** Model Size = parameters × bytes per parameter = 50,000 × 4 bytes = 200,000 bytes = 200 KB. Note that this is just the model weights stored in Flash memory - runtime inference will require additional SRAM for activations, input buffers, and temporary variables (typically 2-3× the model size).:::::: {.callout-note collapse="true" title="Question 3: When should you use edge ML versus cloud ML?"}**Answer:** Use edge ML when you need: (1) Low latency (<100ms), (2) Privacy/security (data cannot leave device), (3) Offline operation (no guaranteed connectivity), or (4) Low ongoing costs (avoid cloud fees). Use cloud ML when you need: (1) High accuracy with large models, (2) Centralized data aggregation, (3) Easy updates/retraining, or (4) Unlimited compute resources. Many real-world systems use hybrid approaches.:::::: {.callout-note collapse="true" title="Question 4: Your Arduino Nano 33 BLE has 256KB RAM but your model won't run. The model file is only 80KB. What's likely wrong?"}**Answer:** The 80KB model size is the Flash storage size for weights. Runtime inference requires a tensor arena in SRAM for input buffers, intermediate activations, and output tensors. An 80KB model might need 200-300KB of SRAM to execute. Check the tensor arena size - it likely exceeds 256KB. Solutions: reduce model complexity, use quantization, or choose layers with lower activation memory requirements.:::::: {.callout-note collapse="true" title="Question 5: Why does the course use a three-tier execution model (Notebook → Simulator → Device)?"}**Answer:** The three-tier model reduces debugging time and risk: (1) Notebook tier validates algorithms and trains models in a familiar environment with full debugging tools, (2) Simulator tier catches quantization errors, memory issues, and preprocessing bugs before hardware deployment, (3) Device tier confirms real-world performance. Jumping directly to hardware makes debugging extremely difficult because you can't easily inspect intermediate values or use standard debugging tools.:::## Executable Code ExamplesBelow are practical Python functions you can run to analyze edge ML constraints.### Memory Requirement Calculator```{python}def calculate_model_memory(num_parameters, dtype='float32'):""" Calculate model memory requirements in KB. Args: num_parameters: Number of model parameters (weights + biases) dtype: Data type ('float32', 'float16', or 'int8') Returns: Model size in KB """ bytes_per_param = {'float32': 4, 'float16': 2, 'int8': 1}if dtype notin bytes_per_param:raiseValueError(f"dtype must be one of {list(bytes_per_param.keys())}") total_bytes = num_parameters * bytes_per_param[dtype] size_kb = total_bytes /1024return size_kb# Example usagemodel_params =100000print(f"Model with {model_params:,} parameters:")print(f" float32: {calculate_model_memory(model_params, 'float32'):.2f} KB")print(f" float16: {calculate_model_memory(model_params, 'float16'):.2f} KB")print(f" int8: {calculate_model_memory(model_params, 'int8'):.2f} KB")print(f"\nQuantization savings: {4.0}x smaller (float32 → int8)")```### Inference Latency Estimator```{python}def estimate_inference_latency(model_flops, mcu_freq_mhz, efficiency_factor=0.5):""" Estimate model inference latency on a microcontroller. Args: model_flops: Total floating-point operations for inference mcu_freq_mhz: MCU clock frequency in MHz efficiency_factor: Efficiency (0-1), accounting for memory access, instruction overhead (default: 0.5) Returns: Estimated latency in milliseconds """# Operations per second the MCU can theoretically perform ops_per_second = mcu_freq_mhz *1e6# Account for real-world efficiency effective_ops_per_second = ops_per_second * efficiency_factor# Time in seconds latency_seconds = model_flops / effective_ops_per_second# Convert to milliseconds latency_ms = latency_seconds *1000return latency_ms# Example: Small CNN on Arduino Nano 33 BLE (64 MHz)model_flops =5e6# 5 million operationsarduino_freq =64# MHzlatency = estimate_inference_latency(model_flops, arduino_freq)print(f"Model with {model_flops/1e6:.1f}M FLOPs on Arduino Nano 33 BLE (64 MHz):")print(f" Estimated latency: {latency:.2f} ms")# Check if suitable for real-time audio (10ms window)if latency <10:print(" ✓ Fast enough for real-time audio processing")else:print(" ✗ Too slow for real-time audio (need <10ms)")# Compare with ESP32esp32_latency = estimate_inference_latency(model_flops, 240)print(f"\nSame model on ESP32 (240 MHz): {esp32_latency:.2f} ms")print(f"Speed improvement: {latency/esp32_latency:.1f}x faster")```### Device Compatibility Checker```{python}def check_device_fit(model_size_kb, device_ram_kb, device_flash_kb, arena_multiplier=2.5):""" Check if a model fits on a target device. Args: model_size_kb: Model size in KB device_ram_kb: Available RAM on device device_flash_kb: Available Flash storage arena_multiplier: RAM arena = model_size × multiplier (default: 2.5) Returns: Dictionary with fit analysis """ required_ram = model_size_kb * arena_multiplier fits_flash = model_size_kb <= device_flash_kb fits_ram = required_ram <= device_ram_kb result = {'model_size_kb': model_size_kb,'required_ram_kb': required_ram,'available_ram_kb': device_ram_kb,'fits_flash': fits_flash,'fits_ram': fits_ram,'can_deploy': fits_flash and fits_ram,'ram_usage_percent': (required_ram / device_ram_kb) *100if fits_ram elseNone }return result# Test with common devicesdevices = {'Arduino Nano 33 BLE': {'ram': 256, 'flash': 1024},'ESP32': {'ram': 520, 'flash': 4096},'Raspberry Pi Pico': {'ram': 264, 'flash': 2048},}model_size =98# 98 KB quantized model (from 100K params × 1 byte)print("Model Compatibility Analysis")print("="*70)for device_name, specs in devices.items(): result = check_device_fit(model_size, specs['ram'], specs['flash']) status ="✓ CAN DEPLOY"if result['can_deploy'] else"✗ CANNOT DEPLOY"print(f"\n{device_name}: {status}")print(f" Model size: {result['model_size_kb']} KB")print(f" Required RAM: {result['required_ram_kb']:.1f} KB")print(f" Available RAM: {result['available_ram_kb']} KB")if result['ram_usage_percent']:print(f" RAM usage: {result['ram_usage_percent']:.1f}%")```### Device Comparison Table```{python}import pandas as pddef compare_devices_for_model(model_params, dtype='int8'):""" Generate a comparison table showing which devices can run a model. Args: model_params: Number of model parameters dtype: Data type for the model Returns: pandas DataFrame with device comparison """# Device specifications devices_data = [ {'Device': 'Arduino Nano 33 BLE', 'CPU MHz': 64, 'RAM KB': 256,'Flash KB': 1024, 'Power mW': 20, 'Cost': '$15'}, {'Device': 'ESP32', 'CPU MHz': 240, 'RAM KB': 520,'Flash KB': 4096, 'Power mW': 100, 'Cost': '$10'}, {'Device': 'Raspberry Pi Pico', 'CPU MHz': 133, 'RAM KB': 264,'Flash KB': 2048, 'Power mW': 25, 'Cost': '$4'}, {'Device': 'Raspberry Pi 4', 'CPU MHz': 1500, 'RAM KB': 4194304,'Flash KB': 32768000, 'Power mW': 5000, 'Cost': '$55'}, {'Device': 'Jetson Nano', 'CPU MHz': 1430, 'RAM KB': 4194304,'Flash KB': 16384000, 'Power mW': 10000, 'Cost': '$99'}, ] df = pd.DataFrame(devices_data)# Calculate model size model_size_kb = calculate_model_memory(model_params, dtype)# Calculate required RAM (2.5x model size for tensor arena) required_ram_kb = model_size_kb *2.5# Check compatibility df['Model Size KB'] =round(model_size_kb, 1) df['Required RAM KB'] =round(required_ram_kb, 1) df['Flash OK'] = df['Flash KB'] >= model_size_kb df['RAM OK'] = df['RAM KB'] >= required_ram_kb df['Compatible'] = df['Flash OK'] & df['RAM OK'] df['RAM Usage %'] = (required_ram_kb / df['RAM KB'] *100).round(1)# Format for display df['Compatible'] = df['Compatible'].map({True: '✓', False: '✗'})return df# Example: Compare devices for a 50K parameter modelmodel_params =50000comparison = compare_devices_for_model(model_params, dtype='int8')print(f"\nDevice Comparison for {model_params:,} parameter model (int8 quantized)")print("="*90)# Display key columnsdisplay_cols = ['Device', 'CPU MHz', 'RAM KB', 'Model Size KB','Required RAM KB', 'RAM Usage %', 'Compatible']print(comparison[display_cols].to_string(index=False))print("\nKey Insights:")print("- The 50K parameter model is 49 KB when quantized to int8")print("- Runtime requires ~122 KB RAM (2.5x model size for tensor arena)")print("- Arduino Nano and Pi Pico can run it, using ~46-50% of their RAM")print("- Larger devices have plenty of headroom for more complex models")```## Interactive NotebookThe notebook below contains runnable code for all Level 1 activities.{{< embed ../../notebooks/LAB01_introduction.ipynb >}}## Three-Tier Activities::: {.panel-tabset}### Level 1: NotebookRun the embedded notebook above. Key exercises:1. Follow along with the code cells2. Modify parameters and observe results3. Complete the checkpoint questions### Level 2: SimulatorThis introductory lab is primarily conceptual. There is no dedicated Level 2 simulation here; instead:- Skim the simulations overview: [Simulations Index](../simulations/index.qmd)- Optionally, open the Arduino sensor simulator used in later labs: [Arduino Multi-Sensor Simulator](../simulations/wokwi-arduino-sensors.qmd)You do not need hardware or MCU simulators yet; they will be used starting from LAB08 and LAB11.### Level 3: DeviceThis lab does not require deployment to real devices. Hands-on MCU and Raspberry Pi deployment begins in:- **LAB05** – Embedded deployment with TensorFlow Lite Micro- **LAB08** – Arduino sensors and actuators- **LAB12** – Streaming pipelines on Raspberry PiIf you already have an Arduino Nano 33 BLE Sense or Raspberry Pi, you can read ahead in those labs to see how the three-tier model applies in practice.:::## Related Labs::: {.callout-tip}## Foundations Track- **LAB02: ML Foundations** - Build on edge computing concepts with neural network basics- **LAB03: Quantization** - Learn how to optimize models for edge deployment:::::: {.callout-tip}## Getting Started with Hardware- **LAB05: Edge Deployment** - Deploy your first model to a microcontroller- **LAB08: Arduino Sensors** - Start working with physical sensors:::## Related Resources- [Hardware Guide](../resources/hardware.qmd) - Equipment needed for Level 3- [Troubleshooting](../resources/troubleshooting.qmd) - Common issues and solutions