flowchart TD
A[Start: Model won't load] --> B{Does model file exist?}
B -->|No| C[Check file path is correct<br/>Verify file was uploaded to device<br/>Check SD card if used]
B -->|Yes| D{Check file size}
D -->|0 bytes or corrupted| E[Re-download or re-convert model<br/>Verify TFLite conversion succeeded<br/>Check disk space during save]
D -->|Normal size| F{Enough Flash memory?}
F -->|No| G[Model too large for device<br/>- Reduce model complexity<br/>- Use more aggressive quantization<br/>- Remove unused layers]
F -->|Yes| H{Check model format}
H -->|Not .tflite| I[Convert to TFLite format:<br/>converter = tf.lite.TFLiteConverter<br/>tflite_model = converter.convert]
H -->|Valid .tflite| J{TFLite version compatible?}
J -->|Version mismatch| K[Update TFLite Micro to match<br/>or re-convert model with<br/>compatible TF version]
J -->|Compatible| L{GetModel returns null?}
L -->|Yes| M[Model schema incompatible<br/>- Rebuild with correct flatbuffer<br/>- Check for custom ops<br/>- Verify model_data array]
L -->|No| N[Check AllocateTensors result<br/>See Memory Allocation flowchart]
style A fill:#ff6b6b
style C fill:#4ecdc4
style E fill:#4ecdc4
style G fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style N fill:#ffe66d
Visual Troubleshooting Guides
Flowchart-Driven Diagnosis for Edge ML Issues
Quick Diagnosis Flowcharts
Use these visual guides to diagnose common edge ML issues systematically. Each flowchart provides a step-by-step decision tree to identify and resolve problems.
- Start at the top node describing your symptom
- Follow the decision paths based on your observations
- Apply the suggested solution at terminal nodes
- If problem persists, check the text-based Troubleshooting Guide
General Edge ML Issues
1. Model Loading Failures
When your model fails to load on the device, this flowchart helps identify whether it’s a file issue, memory problem, or compatibility issue.
2. Inference Accuracy Problems
When your deployed model gives poor results despite good training accuracy, use this diagnostic path.
flowchart TD
A[Start: Poor inference accuracy] --> B{Works in notebook/Python?}
B -->|No| C[Problem is with model itself<br/>- Retrain with more data<br/>- Check for overfitting<br/>- Validate training pipeline]
B -->|Yes| D{Works with TFLite interpreter?}
D -->|No| E[Quantization error<br/>- Use representative dataset<br/>- Try quantization-aware training<br/>- Check for extreme activations]
D -->|Yes| F{Check preprocessing}
F -->|Different from training| G[Match preprocessing exactly:<br/>- Same normalization 0-1 or -1 to 1<br/>- Same scaling factors<br/>- Same color space RGB/BGR]
F -->|Matches training| H{Check input data type}
H -->|Type mismatch| I[Fix input tensor type:<br/>- float32 vs uint8<br/>- Signed vs unsigned<br/>- Check input_details]
H -->|Correct type| J{Check output interpretation}
J -->|Wrong postprocessing| K[Fix output handling:<br/>- Apply softmax if needed<br/>- Correct argmax usage<br/>- Check dequantization]
J -->|Correct| L{Sensor data quality}
L -->|Noisy/inconsistent| M[Improve sensor pipeline:<br/>- Add filtering moving avg<br/>- Increase sampling rate<br/>- Calibrate sensors per user]
L -->|Good quality| N[Model may not generalize<br/>- Collect more diverse data<br/>- Test edge cases<br/>- Consider retraining]
style A fill:#ff6b6b
style C fill:#4ecdc4
style E fill:#4ecdc4
style G fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style N fill:#4ecdc4
3. Memory Allocation Errors
The dreaded “tensor arena too small” and related memory issues.
flowchart TD
A[Start: Memory allocation fails] --> B{Error message type?}
B -->|Tensor arena too small| C[Check arena_used_bytes]
C --> D{Have you profiled it?}
D -->|No| E[Add this code:<br/>Serial.print arena_used_bytes<br/>Start with large arena 100KB]
D -->|Yes| F[Set arena = used_bytes × 1.2<br/>20% safety margin]
B -->|AllocateTensors fails| G{Enough total SRAM?}
G -->|No| H[Reduce memory usage:<br/>- Smaller input buffers<br/>- Reduce model size<br/>- Remove debug code]
G -->|Yes| I{Check for memory leaks}
I -->|Calling AllocateTensors<br/>repeatedly| J[Only call once in setup<br/>Not in loop]
I -->|Static allocation OK| K[Check ops resolver:<br/>Missing required ops?]
B -->|Segmentation fault| L{Using static allocation?}
L -->|No malloc on MCU| M[Declare tensors static:<br/>static uint8_t tensor_arena<br/>alignas 16]
L -->|Already static| N{Check array bounds}
N -->|Buffer overflow| O[Validate input sizes:<br/>- Clip values to expected range<br/>- Check buffer indexing<br/>- Add bounds checking]
N -->|Bounds OK| P[Enable debug symbols<br/>Use GDB or Platform IO debugger]
B -->|Stack overflow| Q[Increase stack size or<br/>reduce local variables<br/>Move large arrays to static]
style A fill:#ff6b6b
style E fill:#ffe66d
style F fill:#4ecdc4
style H fill:#4ecdc4
style J fill:#4ecdc4
style K fill:#ffe66d
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#ffe66d
style Q fill:#4ecdc4
Hardware-Specific Issues
4. Arduino Upload Failures
When you can’t upload your sketch to Arduino.
flowchart TD
A[Start: Upload fails] --> B{Arduino detected?}
B -->|Port not shown| C[Check USB connection:<br/>- Try different cable<br/>- Try different USB port<br/>- Restart Arduino IDE]
B -->|Port shows up| D{Correct board selected?}
D -->|Wrong board| E[Tools → Board → select correct:<br/>Arduino Nano 33 BLE<br/>ESP32 Dev Module etc]
D -->|Correct board| F{Upload error message?}
F -->|avrdude timeout| G[Press reset button twice<br/>quickly to enter bootloader<br/>Upload within 8 seconds]
F -->|Port in use| H[Close Serial Monitor<br/>Close other IDE instances<br/>Restart IDE]
F -->|Sketch too big| I[Reduce program size:<br/>- Use MicroMutableOpResolver<br/>- Remove debug prints<br/>- Smaller model]
F -->|Compilation error| J{TensorFlow Lite library?}
J -->|Not installed| K[Install via Library Manager:<br/>Arduino_TensorFlowLite<br/>or TensorFlowLite_ESP32]
J -->|Version conflict| L[Update all libraries<br/>Check TF version compatibility<br/>Use 2.4.0-ALPHA for MCU]
J -->|Code syntax error| M[Check error message:<br/>- Missing semicolons<br/>- Undeclared variables<br/>- Type mismatches]
style A fill:#ff6b6b
style C fill:#4ecdc4
style E fill:#4ecdc4
style G fill:#4ecdc4
style H fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style L fill:#4ecdc4
style M fill:#ffe66d
5. ESP32 WiFi Connection Issues
Debugging wireless connectivity on ESP32.
flowchart TD
A[Start: WiFi won't connect] --> B{WiFi.status returns?}
B -->|WL_NO_SSID_AVAIL| C[Network not found:<br/>- Check SSID spelling<br/>- Ensure 2.4GHz not 5GHz<br/>- Move closer to router]
B -->|WL_CONNECT_FAILED| D[Authentication failed:<br/>- Verify password correct<br/>- Check security type WPA2<br/>- Router MAC filter?]
B -->|WL_DISCONNECTED| E{Connecting then drops?}
E -->|Yes| F[Weak signal or interference:<br/>- Move closer to AP<br/>- Add external antenna<br/>- Reduce TX power if overheating]
E -->|Never connects| G{Delay after WiFi.begin?}
G -->|No| H[Add connection timeout:<br/>while WiFi.status != WL_CONNECTED<br/> delay 500<br/> retry up to 20 times]
G -->|Yes timeout| I{Check router settings}
I -->|AP isolation enabled| J[Disable AP isolation<br/>on router to allow<br/>device-to-device comm]
I -->|DHCP full| K[Assign static IP or<br/>increase DHCP pool size]
B -->|WL_IDLE_STATUS| L[WiFi not initialized:<br/>WiFi.mode WIFI_STA<br/>before WiFi.begin]
B -->|WL_CONNECTED but<br/>no internet| M{Can ping gateway?}
M -->|No| N[Local network issue:<br/>Check subnet mask<br/>Check gateway IP]
M -->|Yes| O[DNS or internet issue:<br/>Try 8.8.8.8 for DNS<br/>Check if router has internet]
style A fill:#ff6b6b
style C fill:#4ecdc4
style D fill:#4ecdc4
style F fill:#4ecdc4
style H fill:#4ecdc4
style J fill:#4ecdc4
style K fill:#4ecdc4
style L fill:#4ecdc4
style N fill:#4ecdc4
style O fill:#4ecdc4
6. Sensor Reading Problems
When sensor data looks wrong or inconsistent.
flowchart TD
A[Start: Bad sensor readings] --> B{Sensor type?}
B -->|Analog ADC| C{Reading always 0 or 1023?}
C -->|Always max/min| D[Check wiring:<br/>- Need voltage divider?<br/>- Correct pin analog capable?<br/>- Ground connection OK?]
C -->|Values present but wrong| E{Calibrated?}
E -->|No| F[Implement calibration:<br/>- Record min/max values<br/>- Map to expected range<br/>- Account for offset/drift]
E -->|Yes but noisy| G[Add filtering:<br/>- Moving average window 5-10<br/>- Median filter for spikes<br/>- Low-pass RC filter hardware]
B -->|Digital I2C/SPI| H{Communication working?}
H -->|No response| I[Check I2C address:<br/>- Scan for devices<br/>- Check A0 A1 jumpers<br/>- Verify pull-up resistors]
H -->|Returns 0xFF or error| J[Check timing:<br/>- Clock speed too fast?<br/>- Adequate delays?<br/>- Power supply stable?]
H -->|Intermittent| K[Check connections:<br/>- Loose wires?<br/>- Cable length too long?<br/>- Electromagnetic interference?]
B -->|Timing-sensitive| L{Consistent sample rate?}
L -->|Irregular timing| M[Use millis for timing:<br/>unsigned long last = 0<br/>if millis - last >= interval<br/> read sensor]
L -->|Regular but wrong values| N{Check sensor datasheet}
N -->|Wrong voltage| O[Voltage level issue:<br/>- 3.3V vs 5V logic<br/>- Use level shifter<br/>- Check sensor V rating]
N -->|Conversion needed| P[Apply formula from datasheet:<br/>- Temperature coefficients<br/>- Resistance to value<br/>- Raw to engineering units]
style A fill:#ff6b6b
style D fill:#4ecdc4
style F fill:#4ecdc4
style G fill:#4ecdc4
style I fill:#4ecdc4
style J fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#4ecdc4
Training Issues
7. Training Loss Not Decreasing
When your model isn’t learning during training.
flowchart TD
A[Start: Loss not decreasing] --> B{Loss value?}
B -->|Loss is NaN| C[Gradient explosion:<br/>- Reduce learning rate 10x<br/>- Add gradient clipping<br/>- Check for bad data inf/NaN]
B -->|Loss constant high| D{Check learning rate}
D -->|LR too small<br/>1e-6 or less| E[Increase learning rate:<br/>Try 1e-3 for Adam<br/>Try 0.01 for SGD]
D -->|LR reasonable| F{Data preprocessed?}
F -->|No normalization| G[Normalize inputs:<br/>x = x / 255.0 for images<br/>StandardScaler for tabular<br/>Mean 0 std 1 generally]
F -->|Already normalized| H{Check labels}
H -->|All same label<br/>or wrong format| I[Fix label issues:<br/>- One-hot encode categorical<br/>- Balance class distribution<br/>- Verify ground truth correct]
H -->|Labels OK| J{Model capacity?}
J -->|Too small| K[Increase model size:<br/>- Add layers<br/>- Increase units per layer<br/>- But avoid overfitting]
J -->|Reasonable size| L{Activation functions?}
L -->|All linear or wrong| M[Use proper activations:<br/>- ReLU for hidden layers<br/>- Softmax for classification<br/>- Sigmoid for binary]
L -->|Activations OK| N{Loss function correct?}
N -->|Mismatch with task| O[Match loss to task:<br/>- Categorical crossentropy multi-class<br/>- Binary crossentropy binary<br/>- MSE for regression]
N -->|Correct loss| P[Try different optimizer:<br/>- Adam usually works<br/>- SGD with momentum<br/>- Reduce batch size if large]
style A fill:#ff6b6b
style C fill:#4ecdc4
style E fill:#4ecdc4
style G fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#4ecdc4
8. Overfitting Detection and Solutions
When training accuracy is high but validation accuracy is low.
flowchart TD
A[Start: Train acc high<br/>Val acc low] --> B{Gap between accuracies?}
B -->|>20% difference| C[Severe overfitting detected]
B -->|10-20% difference| D[Moderate overfitting]
B -->|<10% difference| E[Mild - may be acceptable<br/>for edge ML trade-off]
C --> F{Dataset size?}
F -->|<100 samples/class| G[Collect more data:<br/>- Aim for 500+ per class<br/>- Critical for deep learning<br/>- Data beats algorithms]
F -->|Adequate data| H{Using augmentation?}
H -->|No| I[Add data augmentation:<br/>- Random flips rotations<br/>- Noise injection<br/>- Time warping for audio/sensors]
H -->|Yes| J{Regularization applied?}
J -->|No regularization| K[Add regularization:<br/>- L2 weight decay 1e-4<br/>- Dropout 0.3-0.5 after FC layers<br/>- Batch normalization]
J -->|Already using| L{Model complexity?}
L -->|Very deep/wide| M[Reduce model size:<br/>- Fewer layers<br/>- Smaller hidden dims<br/>- Early stopping on val loss]
L -->|Simple model| N{Check validation set}
N -->|Different distribution| O[Fix data split:<br/>- Stratified split by class<br/>- Shuffle before split<br/>- Ensure representative sample]
N -->|Distribution matches| P[Consider ensemble:<br/>- Multiple models vote<br/>- Or accept slight overfit<br/>- Edge models trade accuracy]
D --> J
E --> Q[Monitor in production<br/>May need retraining with<br/>real-world data later]
style A fill:#ff6b6b
style G fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#4ecdc4
style Q fill:#ffe66d
9. Quantization Accuracy Drop
When converting from float32 to int8 causes significant accuracy loss.
flowchart TD
A[Start: Accuracy drops<br/>after quantization] --> B{Accuracy drop amount?}
B -->|>10% drop| C[Severe degradation]
B -->|5-10% drop| D[Moderate - may be fixable]
B -->|<5% drop| E[Acceptable for edge<br/>deployment trade-off]
C --> F{Using representative dataset?}
F -->|No rep_dataset| G[Critical - add representative data:<br/>def rep_data_gen:<br/> for sample in dataset<br/> yield sample<br/>converter.representative_dataset = rep_data_gen]
F -->|Have rep_dataset| H{Dataset diverse enough?}
H -->|<100 samples or<br/>single scenario| I[Expand representative dataset:<br/>- Include all input variations<br/>- Cover activation ranges<br/>- Multiple scenarios/users]
H -->|Dataset adequate| J{Try quantization-aware training}
J -->|Not using QAT| K[Implement QAT:<br/>model = tfmot.quantization.keras<br/> .quantize_model model<br/>Train with quantization simulation]
J -->|Already using QAT| L{Check activation ranges}
L -->|Extreme outliers| M[Fix extreme activations:<br/>- Clip outliers in preprocessing<br/>- Use batch normalization<br/>- Scale input range properly]
L -->|Ranges normal| N{Specific layer causing issue?}
N -->|One layer drops accuracy| O[Keep that layer in float:<br/>converter.target_spec<br/> .supported_ops TFLITE_BUILTINS<br/>Mark layer as non-quantizable]
N -->|Whole model degrades| P[Model may not be quantizable:<br/>- Try different architecture<br/>- Use larger model to compensate<br/>- Consider dynamic range quant]
D --> F
E --> Q[Deploy and monitor<br/>Acceptable for most<br/>edge ML applications]
style A fill:#ff6b6b
style G fill:#4ecdc4
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#ffe66d
style Q fill:#95e1d3
Deployment Issues
10. TFLite Conversion Errors
When converting your Keras model to TFLite format fails.
flowchart TD
A[Start: TFLite conversion fails] --> B{Error message type?}
B -->|Unsupported operation| C{Which operation?}
C -->|Custom layer| D[Replace or reimplement:<br/>- Use built-in equivalent<br/>- Implement as TFLite custom op<br/>- Redesign model architecture]
C -->|Standard op but flagged| E[Enable TF op fallback:<br/>supported_ops = <br/> TFLITE_BUILTINS<br/> SELECT_TF_OPS]
B -->|Dynamic tensor shape| F{Where are dynamic shapes?}
F -->|Input layer| G[Set explicit input shape:<br/>Input shape=fixed_shape<br/>Avoid None dimensions]
F -->|Internal layers| H[Redesign model:<br/>- Remove dynamic reshaping<br/>- Use fixed size tensors<br/>- Pad to max size if needed]
B -->|Graph optimization failed| I{Model complexity?}
I -->|Very complex graph| J[Simplify model:<br/>- Remove unnecessary ops<br/>- Fuse batch norm into conv<br/>- Remove training-only ops]
I -->|Simple model| K{TensorFlow version?}
K -->|TF 2.0-2.3 old| L[Update TensorFlow:<br/>pip install tensorflow==2.10<br/>Rebuild model with new version]
K -->|Version OK| M[Try different converter:<br/>from_keras_model vs<br/>from_saved_model vs<br/>from_concrete_functions]
B -->|Quantization error| N[See Quantization flowchart<br/>Check representative dataset<br/>Try post-training quant only]
B -->|Model is None| O{Model saved correctly?}
O -->|Not saved| P[Save model first:<br/>model.save model.h5<br/>or tf.saved_model.save]
O -->|Saved but corrupt| Q[Re-train and save:<br/>Check disk space<br/>Verify file integrity]
style A fill:#ff6b6b
style D fill:#4ecdc4
style E fill:#4ecdc4
style G fill:#4ecdc4
style H fill:#4ecdc4
style J fill:#4ecdc4
style L fill:#4ecdc4
style M fill:#4ecdc4
style N fill:#ffe66d
style P fill:#4ecdc4
style Q fill:#4ecdc4
11. Real-Time Performance Issues
When inference is too slow for real-time operation.
flowchart TD
A[Start: Inference too slow] --> B{Measure current latency}
B --> C{Latency vs requirement?}
C -->|2-5x too slow| D[Significant optimization needed]
C -->|1.5-2x too slow| E[Minor optimization may suffice]
C -->|Just barely slow| F[Fine-tune existing setup]
D --> G{Platform?}
G -->|Microcontroller| H{Model size?}
H -->|Large model| I[Reduce model complexity:<br/>- Fewer layers depth<br/>- Smaller kernels 3x3 not 5x5<br/>- Reduce channels width<br/>- Use depthwise separable conv]
H -->|Already minimal| J{Optimize ops}
J --> K[Profile which ops slow:<br/>- Use CMSIS-NN optimizations<br/>- Enable hardware acceleration<br/>- Check if ops are optimized<br/>- Consider assembly for critical ops]
G -->|Raspberry Pi| L{Using TFLite?}
L -->|Using full TensorFlow| M[Switch to TFLite:<br/>interpreter = tf.lite.Interpreter<br/>4-10x faster than full TF]
L -->|Already TFLite| N{Threading enabled?}
N -->|Single thread| O[Enable multi-threading:<br/>Interpreter num_threads=4<br/>Use all CPU cores]
N -->|Multi-threaded| P{Hardware acceleration?}
P -->|No accelerator| Q[Use available hardware:<br/>- Coral TPU USB 100x faster<br/>- Intel Neural Stick 2<br/>- GPU if available]
P -->|Using accelerator| R[Optimize model for accelerator:<br/>- INT8 for Edge TPU<br/>- Check supported ops<br/>- Profile bottlenecks]
E --> S{Reduce input size?}
S -->|Can downsample| T[Reduce input dimensions:<br/>- 96x96 instead of 224x224<br/>- Lower audio sample rate<br/>- Skip frames temporal stride]
S -->|Input size fixed| U[Batch processing if applicable<br/>or reduce inference frequency]
F --> V[Fine-tune parameters:<br/>- Compiler optimizations -O3<br/>- Reduce logging overhead<br/>- Check sensor read time]
style A fill:#ff6b6b
style I fill:#4ecdc4
style K fill:#4ecdc4
style M fill:#4ecdc4
style O fill:#4ecdc4
style Q fill:#4ecdc4
style R fill:#4ecdc4
style T fill:#4ecdc4
style U fill:#4ecdc4
style V fill:#4ecdc4
12. Power Consumption Too High
When your battery-powered device drains too quickly.
flowchart TD
A[Start: Battery drains too fast] --> B{Measure current draw}
B --> C{When is power high?}
C -->|Always high even idle| D[Idle power issue]
C -->|High during inference| E[Inference power issue]
C -->|High during wireless| F[Radio power issue]
D --> G{Sleep modes enabled?}
G -->|No sleep| H[Implement sleep modes:<br/>- Deep sleep between samples<br/>- Light sleep during idle<br/>- Wake on interrupt not polling]
G -->|Sleep enabled| I{Peripherals powered down?}
I -->|Always on| J[Disable unused peripherals:<br/>- Turn off LEDs<br/>- Power down sensors when idle<br/>- Disable USB if not needed]
I -->|Optimized| K[Check for current leaks:<br/>- Pull-up/down resistors<br/>- Floating pins<br/>- LDO efficiency]
E --> L{Inference frequency?}
L -->|Very frequent| M[Reduce inference rate:<br/>- 1 Hz instead of 10 Hz<br/>- On-demand vs continuous<br/>- Motion trigger activation]
L -->|Already low| N{Model efficiency?}
N -->|Large complex model| O[Optimize model:<br/>- Smaller architecture<br/>- INT8 quantization<br/>- Prune unnecessary weights<br/>- Knowledge distillation]
N -->|Efficient model| P[Hardware acceleration:<br/>- Dedicated ML accelerator<br/>- Lower voltage operation<br/>- Better power profile MCU]
F --> Q{WiFi always on?}
Q -->|Yes| R[Optimize radio usage:<br/>- Connect only when needed<br/>- Reduce TX power<br/>- Batch transmissions<br/>- Use BLE instead of WiFi]
Q -->|Optimized usage| S{Connection parameters?}
S -->|Frequent reconnects<br/>or poor signal| T[Improve connectivity:<br/>- Keep-alive intervals<br/>- Better antenna position<br/>- Closer to AP<br/>- Lower data rate trade quality]
S -->|Parameters good| U[Consider different protocol:<br/>- LoRa for long range low power<br/>- BLE for short range<br/>- Zigbee for mesh networks]
style A fill:#ff6b6b
style H fill:#4ecdc4
style J fill:#4ecdc4
style K fill:#ffe66d
style M fill:#4ecdc4
style O fill:#4ecdc4
style P fill:#4ecdc4
style R fill:#4ecdc4
style T fill:#4ecdc4
style U fill:#4ecdc4