Last week, Qualcomm shook up the maker world by acquiring Arduino. A new board, Arduino Uno Q was launched at the event βFrom Blink to Thinkβ. The board promises Linux-powered AI performance while keeping microcontroller-level control. Itβs a big leap for Arduino with single board computers. I am super excited to try it out once delivered. But letβs be real: Arduinos have been AI capable and βthinkingβ for almost a decade now. In this post, we will show that deploying ML models on Arduino is possible even without the Uno Q SBC.
Objectives:
- Understand Arduino Uno Q, and what to expect from it
- A peek into the past AI capable Arduino boards
- Train a classification model from scratch using Tensorflow
- Deploy TfLite model on Arduino Nano 33 BLE
- Build a Gradio interface for Inferencing using BLE feature
- Arduino Uno Q SBC: Whatβs New?
- AI Ready Arduino Boards in the Past
- Installation of Tools and Packages
- Training Classification Model on MNIST Digits Dataset
- Sketch for Deploying ML Models on Arduino
- Gradio App to Manage Inputs for Deploying ML Models on Arduino
- Conclusion: Deploying ML Models on Arduino
Arduino Uno Q SBC: Whatβs New?
The Arduino Uno Q marks a major evolution in the Arduino universe. It is Linux-capable computing together with real-time microcontroller control in a single board. The board comes with a hybrid βdual-brainβ architecture. It means developers can run AI, vision, and audio workloads while still managing precise timing and control via MCU. Following are the specifications of the Arduino Uno Q.
Micro Processor Unit, MPU:
- Qualcomm Dragonwing QRB2210 MPU (quad-core Cortex-A53, up to 2.0 GHz)
- Adreno GPU 3D graphics accelerator
- Memory: 2GB LPDDR4 | higher variants planned in future
- Storage: 16 GB eMMC built-in
Micro Controller Unit, MCU:
- STM32U585 MCU (Cortex-M33)
- 2MB Flash
- 786 kB SRAM
- Floating Point Unit, Single Precision ( FP32 only )
Is Arduino Uno Q a Raspberry Pi Killer?
Arduinoβs philosophy is centered on low energy consumption, efficiency and speed. Looking at the specifications, itβs nowhere close to Raspberry Pi 5. It is not designed to replace Raspberry Pi but rather to bridge the gap between full fledged SBCs and MCUs. Although I am not sure how it branches ahead in future.
The UNO Qβs MCU gives it deterministic timing for tasks like motor control, sensor fusion, robotics etc. Raspberry Pi alone canβt do these things precisely. Interested in measuring Piβs power? Check out my previous article on Raspberry Pi: VLM on Edge.
It also has a dedicated GPU and optimized neural capabilities through Qualcommβs SDKs, giving it a solid edge in AI + control integration.
AI Ready Arduino Boards in the Past: TinyML Integrations
Arduino Uno Q is not the first AI capable board that Arduino developed. Long before this, several boards were already capable of AI or TinyML workloads. These boards laid the foundation for edge AI on Arduino, long before hybrid SBCs like the Uno Q emerged.
Following are few boards where TinyML models could be deployed.
- Arduino Yun (2013)
- Arduino Tian (2016)
- Arduino Nano 33 BLE (2019)
- Portenta H7 (2019)
- Portenta X8 (2022)
Note that Portenta MCU is still a lot more powerful than the one in Arduino Uno Q. For the scope of the blog post, we will be using Arduino Uno 33 BLE. It has 256 kB SRAM, and 1 MB flash storage.
Yes! You are reading it right. Just 256 kB RAM. Thatβs riddiculously little compared to what we have on mainstream computers or SBCs now a days. For example, my phone has 12 GB RAM.
Why Arduino Nano 33 BLE Now After Five Years?
I got this board during COVID 19 pandemic from the US. It was bought just as a collection, hoping to do something with it. Back then, I had decent knowledge of working with embedded system from hobby projects. However, I had little to no knowledge of Machine Learning. Hence could not implement anything in it. After a while it was forgotten, and it remained hidden for a whileπ . Untill recently when I heard the news of Qualcomm acquiring Arduino and releasing Uno Q SBC, and I was rearranging my collection.
In between, my domain of research shifted to Classical Computer Vision, and then Deep Learning. It was definitely not an easy journey. Fortunately, I got introduced to OpenCV Courses early. It has very well structured modules for Deep Learning using Tensorflow and PyTorch. Checkout OpenCV courses below, it was worth my time.
Installation of Necessary Tools for Deploying ML Models on Arduino
We will need to install Arduino IDE, tensorflow, and some helper packages for the BLE board. You can go ahead and install Arduino IDE software from the official website here. Once done, download the Arduino tfLite support package provided with download code. Itβs available under root > libs > Arduino_TensorFlowLite.zip.
Step 1: Open Arduino IDE and go to Sketch > Include Library > Manage Library in the menu bar. Search for ArduinoBLE, and install the package.
Step 2: Similarly go to Manage Library, but this time proceed through βAdd .zip libraryβ sub menu. Navigate to the downloaded code folder and select Arduino_TensorFlowLite.zip once prompted.
Step 3: We will need XXD tool for conversion for tfLite models to Arduino compatible header files. On mac it comes pre-installed with vim editor. If not, install vim editor using the link.
For windows, it is not pre-installed. Use the same link provided above to download and install. Add vim installation directory to PATH in environment variable.
On Ubuntu, you can install it using sudo apt install xxd
. Verify with the following command on terminal/command prompt for successful installation.
xxd --version
Step 4: Now go ahead and install Tensorflow in a python or conda environment. Following this, install gradio using pip install gradio
command. Thatβs all we need for now.
Training Classification Model on MNIST Digits Dataset
To demonstrate TinyML in action, letβs train a simple digit classification model on the MNIST dataset. It contains 60,000 grayscale images of handwritten digits. Weβll build and train a lightweight neural network from scratch. The model will learn to identify digits by extracting spatial and intensity patterns from 28Γ28 pixel images, and we will be using a CNN architecture.
Import Dependencies
import os, random
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from PIL import Image, ImageFilter
import matplotlib.pyplot as plt
Load and Pre-Process MNIST Classification Dataset
We begin by loading and preprocessing the MNIST dataset, which contains grayscale images of handwritten digits from 0 to 9. Each image was normalized between 0 and 1 and reshaped to include a single channel (28Γ28Γ1)
. We also converted the labels into one-hot encoded vectors for multi-class classification.
# Load and preprocess dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0
# Expand dims β (N, 28, 28, 1)
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)
# One-hot encode labels
y_train_onehot = tf.keras.utils.to_categorical(y_train, 10)
y_test_onehot = tf.keras.utils.to_categorical(y_test, 10)
Add Data Augmentation
We have added more advanced augmentations in the notebook as well. I am not explaining it here to limit the length of the blog. However, if you are in doubt, please feel free to ask in the comments below. We also have a very detailed blog post on Implementing a CNN using Tensorflow and Keras. Checkout for more details.
# Base geometric augmentations
base_datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1,
fill_mode="nearest"
)
base_datagen.fit(X_train)
Create and Compile A Compact CNN Model
Our CNN consists of two convolutional layers with ReLU activation that progressively learn spatial features. Followed by max pooling layers to reduce spatial dimensions and extract dominant features. The output is then flattened and passed through a fully connected dense layer with 64 neurons for feature integration, followed by a softmax output layer that classifies the image into one of ten categories.
Model Summary:
- Input: 28Γ28 grayscale image
- Parameters: 54,000 trainable weights (Approx.)
- Loss: Categorical Crossentropy
- Optimizer: Adam
Metric: Accuracy
# Compact CNN
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
Train and Evaluate the Model
# Train
batch_size = 64
epochs = 20
history = model.fit(
augmented_generator(X_train, y_train_onehot, batch_size),
validation_data=(X_test, y_test_onehot),
steps_per_epoch=len(X_train)//batch_size,
epochs=epochs,
verbose=1
)
# Evaluate
loss, acc = model.evaluate(X_test, y_test_onehot, verbose=0)
print(f"Test Accuracy: {acc:.4f}")
Quantization of the Classification Model
Arduino Nano 33 BLE does not support floating point operations. Hence, we have to convert the model to INT8 quantized.
We are using a representative dataset
from the training data to calibrate weights and activations during conversion. It ensures accurate scaling from float32 to int8. The final quantized model is then saved as a .tflite
file.
# Quantization (INT8)
def representative_dataset():
for i in range(1000):
img = X_train[i:i+1].astype(np.float32)
yield [img]
# Initialise converter
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# Enable qunatization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Assign calibration data
converter.representative_dataset = representative_dataset
# Force INT8 quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Convert and Save Model
tflite_model = converter.convert()
# Save TFLite model
with open("digits_model_cnn_small_int8.tflite", "wb") as f:
f.write(tflite_model)
print("INT8 TFLite compact model saved.")
print("Model size:", len(tflite_model)/1024, "KB")
Convert To Arduino Compatible Header File
We will convert the quantized TfLite model to Arduino compatible header file now. Run the following cell in the current working directory and done.
!xxd -i digits_model_cnn_small_int8.tflite > digits_model_cnn_small_int8.h
Note: Sometime, the header file may get saved in UTF16 format. It is not supported by the Arduino compiler. As observed on windows, make sure to convert to UTF8 format using windows Text editor tool. Check out the video below for the steps.
Sketch for Deploying ML Models on Arduino
Once header file is ready in correct format, create a new sketch in Arduino IDE and save it (any name). The Digit-Classifier-CNN.ino file and the folder will have same name. Move the header file digits_model_cnn_small_int8.h
to this directory. As you can see in the downloaded code folder, it is already present. At this point you can connect the Arduino Nano 33 BLE board, and upload the code directly for test run. It should compile successfully and upload the code.
After running the manager_over_ble.py
script, you should be able to send images over bluetooth and get prediction. However, this isnβt going to work. As the MAC address of your bluetooth device will be different.
Retrieve Bluetooth device MAC Address
Letβs go ahead and create a new arduino sketch as shown below. It will simply print the address of your Arduino BLE device. Make sure to match the baudrate of code and Serial Monitor, otherwise you will only see garbage value. In case you are seing nothing, check if you have selected the correct USB port. Also reset the board once using physical button on board (single click, while connected to USB port).
#include <ArduinoBLE.h>
void setup() {
Serial.begin(115200);
while (!Serial);
if (!BLE.begin()) {
Serial.println("Starting BLE failed!");
while (1);
}
// Retrieve and print the local BLE MAC address
String mac = BLE.address();
Serial.print("BLE MAC Address: ");
Serial.println(mac);
}
void loop() {
}
Re-upload the Digit-Classifier-CNN.ino code to the board and modify manager_over_ble.py
script with proper address. Now you should be able to upload images and get predictions. Letβs take a look at the arduino sketch now.
Include Libraries and Dependencies to Run ML Model on Arduino
#include <ArduinoBLE.h>
#include <TensorFlowLite.h>
#include "digits_model_cnn_small_int8.h"
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"
#include <Arduino.h>
Define Global Variables To Prepare Arduino Nano 33 BLE
In the following snippet, we define memory and model structures for TFlite inference on-device. Setup BLE (bluetooth low energy), and a debug logger for serial output.
Communication Setup
Since we are using bluetooth for communication, we have to setup these varaibles. The BLE device canβt receive all at once. Itβs max intake capacity is 128 bytes at once. Hence, we have to send it in chunks.
- 28Γ28 grayscale image, so 784 bytes allocated for incoming pixel data
received_bytes
variable tracks how many bytes have been receivedimage_ready
is a flag that becomestrue
when a full image is received
TFLite Micro Setup
Next is Tensorflow Lite Micro setup. This is the library that enables TinyML in microcontrollers. You can checkout the GitHub respository for amazing work done so far by. However, the Arduino TFlite library was removed from Arduino Libraries sometime back (it was kind of a duplicate). You might see some error if you use the package from GitHub directly. No worries here, I have uploaded a ZIP file in the download code. Letβs see whatβs in TFLite Micro setup globals.
tflErrorReporter
handles error and debug messagesresolver
registers all available TFLite operators (Conv2D, Dense, etc.)model
will point to the loaded.tflite
model in flash memoryinterpreter
runs inference using the modelinput
andoutput
are pointers to the modelβs input/output tensors.tensorArena
is a memory buffer (50 KB) where all intermediate tensors and activations are stored during inference.
One of the areas where you have to be careful is tensorArena
size. This is reserved for inference specifically. This is a static memory block, no dynamic memory allocation happends during inferencing. It means about 206 kB for program stack, BLE buffers, global variables, and other tasks. Making it bigger may fail rest of the operations, making it too small will also fail in model loading. If you are trying to fit a different model, you may need to experiment a little with the value.
#define IMG_SIZE 28 * 28 // 784 bytes
uint8_t image_buffer[IMG_SIZE];
int received_bytes = 0;
bool image_ready = false;
// TensorFlow Lite globals
tflite::MicroErrorReporter tflErrorReporter;
tflite::AllOpsResolver resolver;
const tflite::Model* model;
tflite::MicroInterpreter* interpreter;
TfLiteTensor* input;
TfLiteTensor* output;
constexpr int tensorArenaSize = 50 * 1024;
uint8_t tensorArena[tensorArenaSize];
// BLE configuration
BLEService digitService("19b10000-e8f2-537e-4f6c-d104768a1214"); // custom service UUID
BLECharacteristic imageChar("19b10001-e8f2-537e-4f6c-d104768a1214", BLEWriteWithoutResponse | BLEWrite, IMG_SIZE);
BLECharacteristic resultChar("19b10002-e8f2-537e-4f6c-d104768a1214", BLERead | BLENotify, 32);
extern "C" void DebugLog(const char* s) {
Serial.print(s);
}
Setup Function for Arduino BLE
This setup()
function initializes everything needed for running the classifier. We begin by starting serial communication for debugging, then initialize BLE module. After BLE setup, TensorFlow Lite Micro is initialized and model is loaded.
Check for version compatibility, and allocates memory for tensors within the predefined tensor arena. If everything succeeds, we go ahead with retrieving pointers to the modelβs input and output tensors and reports how much of the tensor arena memory was used. Finally, the device is ready to receive images over BLE for inference.
void setup() {
Serial.begin(115200);
while (!Serial);
Serial.println("Starting BLE Digit Classifier...");
Serial.println("Initializing BLE...");
if (!BLE.begin()) {
Serial.println("Starting BLE failed!");
while (1);
}
Serial.println("BLE initialized.");
BLE.setLocalName("DigitClassifier");
BLE.setAdvertisedService(digitService);
digitService.addCharacteristic(imageChar);
digitService.addCharacteristic(resultChar);
BLE.addService(digitService);
imageChar.writeValue((uint8_t)0);
resultChar.writeValue("Waiting");
Serial.println("Starting BLE advertise...");
BLE.advertise();
Serial.println("BLE Device Active, Waiting for Connection...");
Serial.println("Initializing TensorFlow Lite...");
model = tflite::GetModel(digits_model_cnn_small_int8_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
Serial.println("Model schema mismatch!");
while (1);
}
interpreter = new tflite::MicroInterpreter(model, resolver, tensorArena, tensorArenaSize, &tflErrorReporter);
Serial.println("Allocating tensors...");
TfLiteStatus status = interpreter->AllocateTensors();
if (status != kTfLiteOk) {
Serial.println("Tensor allocation failed!");
while (1);
}
input = interpreter->input(0);
output = interpreter->output(0);
Serial.println("Setup complete. Ready to receive images over BLE.");
// Print memory used
size_t used_memory = interpreter->arena_used_bytes();
Serial.print("Tensor arena used: ");
Serial.print(used_memory);
Serial.print(" bytes / ");
Serial.print(tensorArenaSize);
Serial.println(" bytes total");
}
Function to Run Inference
The runInference()
function performs on-device digit recognition by first converting the received 28Γ28 image from 0β255 to INT8 (-128 to 127) format, then running it through the TensorFlow Lite Micro model. It identifies the predicted digit by selecting the output with the highest score, dequantizes it to compute a confidence value, and sends the result via BLE.
void runInference() {
// Map received 0..255 -> int8 -128..127
for (int i = 0; i < IMG_SIZE; i++) {
input->data.int8[i] = static_cast<int8_t>(image_buffer[i] - 128);
}
// Measure inference time
unsigned long start_time = millis();
TfLiteStatus invoke_status = interpreter->Invoke();
unsigned long end_time = millis();
if (invoke_status != kTfLiteOk) {
Serial.println("Inference failed!");
resultChar.writeValue("Error");
return;
}
// Find best prediction
int best = 0;
for (int i = 1; i < output->dims->data[1]; i++) {
if (output->data.int8[i] > output->data.int8[best]) best = i;
}
// Compute confidence
float scale = output->params.scale;
int zero_point = output->params.zero_point;
float confidence = (output->data.int8[best] - zero_point) * scale;
char result[32];
sprintf(result, "Digit:%d Conf:%.2f", best, confidence);
resultChar.writeValue(result);
Serial.print("Predicted: ");
Serial.println(result);
Serial.print("Inference time (ms): ");
Serial.println(end_time - start_time);
Serial.print("Tensor arena used: ");
Serial.println(interpreter->arena_used_bytes());
}
Main Loop Function Running Classifier ML Model on Arduino
void loop() {
BLEDevice central = BLE.central();
if (central) {
Serial.print("Connected to central: ");
Serial.println(central.address());
received_bytes = 0;
image_ready = false;
while (central.connected()) {
if (imageChar.written()) {
int len = imageChar.valueLength();
const uint8_t* data = imageChar.value();
for (int i = 0; i < len && received_bytes < IMG_SIZE; i++) {
image_buffer[received_bytes++] = data[i];
}
if (received_bytes >= IMG_SIZE) {
image_ready = true;
received_bytes = 0;
}
}
if (image_ready) {
Serial.println("Image received. Running inference...");
runInference();
image_ready = false;
}
}
Serial.print("Disconnected from central: ");
Serial.println(central.address());
}
}
Gradio App to Manage Inputs for Deploying ML Models on Arduino
Import Dependencies
import time
import asyncio
import gradio as gr
import numpy as np
from PIL import Image
from bleak import BleakClient
# BLE configuration
DEVICE_ADDR = "84:45:7d:35:39:74" # Replace with your board's BLE MAC
IMG_UUID = "19b10001-e8f2-537e-4f6c-d104768a1214" # image write characteristic
RESULT_UUID = "19b10002-e8f2-537e-4f6c-d104768a1214" # result notify characteristic
TARGET_SIZE = (28, 28)
PREVIEW_SIZE = (128, 128)
CHUNK = 128 # BLE write chunk size in bytes
Function To Send Image to Arduino Nano BLE over Bluetooth
# Send image to BLE + wait for prediction
async def send_image_ble(image_path):
# Load image and resize to match model input
img = Image.open(image_path).convert("L").resize(TARGET_SIZE)
arr = np.array(img, dtype=np.uint8)
# Convert to bytes for BLE transfer
data_bytes = arr.tobytes()
async with BleakClient(DEVICE_ADDR) as client:
if not client.is_connected:
raise Exception("BLE connection failed")
print("β
Connected to BLE device")
result_text = None
# Callback for inference result
def callback(sender, data):
nonlocal result_text
try:
result_text = data.decode(errors="ignore").strip()
print("Received result:", result_text)
except Exception as e:
print("Decode error:", e)
await client.start_notify(RESULT_UUID, callback)
# Send image in chunks (BLE-safe)
print("Sending image data...")
for i in range(0, len(data_bytes), CHUNK):
await client.write_gatt_char(IMG_UUID, data_bytes[i:i+CHUNK], response=False)
await asyncio.sleep(0.03)
# Wait for MCU inference result
print("β³ Waiting for inference result...")
for _ in range(100):
if result_text:
break
await asyncio.sleep(0.05)
await client.stop_notify(RESULT_UUID)
if result_text is None:
result_text = "No response from MCU"
return result_text, img.resize(PREVIEW_SIZE).convert("L")
def send_image_sync(image_path):
"""Synchronous wrapper for Gradio callback"""
return asyncio.run(send_image_ble(image_path))
Gradio App UI to Send Image and Receive Prediction
# Gradio UI
with gr.Blocks() as demo:
gr.Markdown("## CNN Digit Classifier over BLE (Arduino Nano 33 BLE)")
gr.Markdown(
"Upload a **grayscale image (28Γ28)** β itβll be quantized and sent via BLE. "
"Your CNN model on Arduino performs inference and returns the predicted digit."
)
gr.Image("../arduino-nano-33-BLE.jpg", show_label=False, elem_id="banner")
with gr.Row():
inp = gr.Image(type="filepath", label="Upload Image")
out_text = gr.Textbox(label="Predicted Digit / Confidence")
out_preview = gr.Image(label="Preprocessed 28Γ28 Preview")
inp.change(fn=send_image_sync, inputs=inp, outputs=[out_text, out_preview])
if __name__ == "__main__":
demo.launch()
Conclusion: Deploying ML Models on Arduino
Thatβs all about deploying ML models on Arduino Nano 33 BLE. I hope you enjoyed reading the article and found something new. This will not provide ground breaking accuracy but definitely get some basic ML pipeline working.
With frameworks like TensorFlow Lite for Microcontrollers, even compact boards such as the Nano 33 BLE, Portenta H7, and the new Arduino Uno Q can run trained models for tasks like image classification, gesture recognition, and sensor data analysis, all without relying on cloud connectivity. This fusion of AI and embedded systems enables faster, low-power, and privacy-preserving inference right where data is generated. As hardware continues to evolve, the boundary between microcontroller and microprocessor platforms is fading, making on-device intelligence a standard feature rather than an experiment.