Home
>
Edge Devices
>
Getting Started with VLM on Jetson Nano

Kukil
on September 9, 2025

Getting Started with VLM on Jetson Nano

Learn how to setup a pipeline to run VLM on Jetson Nano using Huggingface Transformers. Run models like LiquidAI, Moondream2, FastVLM, and SmolVLM.

Edge Devices, Jetson Nano, VLM on Jetson Nano, VLMs

Tiny Vision Language Models (VLMs) are rapidly transforming the AI landscape. Almost every week, new VLMs with smaller footprints are being released. These models are finding applications across diverse fields – agriculture, robotics, manufacturing, healthcare, wellness, forensics, and more. In this article, you will learn how to run a VLM on Jetson Nano using Huggingface Transformers.

Learning Objectives:

Setup Jetson Orin Nano with Jetpack, CUDA toolkit
Install Huggingface transformers and related libraries
Run LiquidAI, Moondream2, FastVLM, and SmolVLM

Why VLM on Jetson Nano?
How to Setup Jetson Nano for Running VLM
Huggingface Framework for Inferencing VLM on Jetson Nano
Inference Using Moondream2
Inference using LiquidAI
Inference Using FastVLM Family from Apple
Inference Using SmolVLM Family

Why VLM on Jetson Nano?

Previously, we built a Raspberry Pi and Jetson Nano cluster to test various VLMs (and LLMs) on hardwares with limited resources. We compared how the various boards perform while running Qwen2.5VL and Moondream2 using Ollama. Out of all the boards – Jetson Orin Nano (out of the box) was performing well by a huge margin. Checkout VLM on Edge: Worth the Hype or Just a Novelty?.

VLMs on Jetson Nano is generating promising results. Given the 8 GB unified memory – it can also handle some models with parameter count as high as 7B (8 bit quantized).

How to Setup Jetson Nano for Running VLM?

Setting up on edge devices isn’t always straightforward. Dependency conflicts, deprecations, and even unreleased binaries (as of September 2025) can make the process frustrating. That’s where this guide comes in – follow along, and you’ll have your Jetson Orin Nano running VLMs in minutes.

We are using Jetson Orin Nano Devkit, 8GB with 256 GB SSD. One of the parks of using this board is that – it comes with 2 additional slots for installing M.2 SSDs. Note that it only accepts PCIe NVMe drives. The SATA based drives are not supported. Moreover, it is designed for Gen 3 interface hence buying Gen 4 PCIe SSDs will not make it faster. Those will be supported but at the speed of Gen 3. You can also use an SD card but read/write speed will be slower.

Q. How to identify PCIE or SATA drive?

You can take a look at specifications while buying one. In terms of physical appearance, SATA drives have 2 notches and PCIe has 1 notch (usually).

PCIE SSD required for setting up pipeline for VLM on Jetson Nano

2.1 Flash Jetson Orin Nano with Jetpack

Jetpack in Jetson is a Board Support Package (BSP) that comes bundled with Ubuntu OS, Jetson Platform Services, and AI stacks (CUDA, TensorRT, CuDNN etc). We will use Nvidia SDK Manager to flash Jetpack 6 onto the board. Follow the steps provided in the official website.

Note: Jetpack 6 comes with CUDA 12.6. It can’t be changed directly when needed.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

2.2 Install and Verify Jetson Essentials

Let’s check if nvidia-jetpack stack is installed correctly using the following commands. If it is not installed, we will use the following bash commands to perform the same.

apt list --installed | grep nvidia-jetpack
dpkg-query --show nvidia-l4t-core

# Add NVIDIA APT Repositories
sudo bash -c 'echo "deb https://repo.download.nvidia.com/jetson/common r34.1 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list'
sudo bash -c 'echo "deb https://repo.download.nvidia.com/jetson/t234 r34.1 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list'

# Update and Upgrade the System
sudo apt update
sudo apt dist-upgrade

# Install Jetpack SDK and Verify
sudo apt install nvidia-jetpack
apt list --installed | grep nvidia-jetpack
apt list --installed | grep cuda-toolkit
apt list --installed | grep cudnn

The command nvidia-smi does not work on jetson orin nano. We have to install jtop monitoring utility for similar results. It provides real time information of CPU/GPU/Thermals. Once installed, you may have to reboot the board for jtop to take effect. In our case, we had to restart the services as shown below.

sudo apt update
sudo apt upgrade -y
sudo apt install python3-pip -y
sudo -H pip3 install -U jetson-stats

# sudo systemctl restart jtop.service
# sudo reboot now
# jtop

jtop utility displaying stats while running vlm on jetson nano

jtop utility displaying cpu gpu and thermal stats while running vlm on jetson nano

2.3 Install BLAS + Set CUDA Version

OpenBLAS is the linear algebra accelerator. We are also exporting CUDA versions as shown below to match subsequent installs.

sudo apt-get install -y python3-pip libopenblas-dev
export CUDA_VERSION=12.6

2.4 Install Sparse Matrix Library (cuSPARSELt)

The following commands downloads and installs NVIDIA’s cuSPARSELt repo and adds keyring to the trusted sources. If you are installing Jetpack 5, then you will get CUDA 11.x. The final command will be sudo apt -y install cusparselt-cuda-11 in that case.

wget https://developer.download.nvidia.com/compute/cusparselt/0.8.1/local_installers/cusparselt-local-tegra-repo-ubuntu2204-0.8.1_0.8.1-1_arm64.deb
sudo dpkg -i cusparselt-local-tegra-repo-ubuntu2204-0.8.1_0.8.1-1_arm64.deb
sudo cp /var/cusparselt-local-tegra-repo-ubuntu2204-0.8.1/cusparselt-local-tegra-C4CC87E1-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cusparselt-cuda-12

2.5 Install PyTorch, TorchAudio, TorchVision (Jetson-Compatible Wheels)

The compatible pre-built binaries for aarch64 are available in PyPi Jetson AI Labs. We are choosing Jetpack 6 and CUDA 12.6 in our case. If your versions have changed, choose preferrences accordingly.

wget https://pypi.jetson-ai-lab.io/jp6/cu126/+f/590/92ab729aee2b8/torch-2.8.0-cp310-cp310-linux_aarch64.whl#sha256=59092ab729aee2b8937d80cc1b35d1128275bd02a7e1bc911e7efa375bd97226 -O torch-2.8.0-cp310-cp310-linux_aarch64.whl

wget https://pypi.jetson-ai-lab.io/jp6/cu126/+f/de1/5388b8f70e4e1/torchaudio-2.8.0-cp310-cp310-linux_aarch64.whl#sha256=de15388b8f70e4e17a05b23a4ae1f55a288c91449371bb8aeeb69184d40be17f -O torchaudio-2.8.0-cp310-cp310-linux_aarch64.whl

wget https://pypi.jetson-ai-lab.io/jp6/cu126/+f/1c0/3de08a69e9554/torchvision-0.23.0-cp310-cp310-linux_aarch64.whl#sha256=1c03de08a69e95542024477e0cde95fab3436804917133d3f00e67629d3fe902 -O torchvision-0.23.0-cp310-cp310-linux_aarch64.whl

python3 -m pip install numpy
python3 -m pip install --no-cache torch-2.8.0-cp310-cp310-linux_aarch64.whl
python3 -m pip install torchvision-0.23.0-cp310-cp310-linux_aarch64.whl
python3 -m pip install torchaudio-2.8.0-cp310-cp310-linux_aarch64.whl

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

2.6 Install cuDSS (CUDA Sparse Solver)

CUDA Direct Sparse Solver or cuDSS is an NVIDIA library designed to solve sparse linear systems of equations on GPUs. Sparse systems appear a lot in scientific computing, optimization, circuit simulation, machine learning, and graph analytics – basically for large matrices but most entries are zeros. The following commands will downoad, install, and verify the same.

mkdir -p tmp_cudss && cd tmp_cudss
CUSPARSE_SOLVER_NAME="libcudss-linux-sbsa-0.6.0.5_cuda12-archive"
curl -L -O https://developer.download.nvidia.com/compute/cudss/redist/libcudss/linux-sbsa/${CUSPARSE_SOLVER_NAME}.tar.xz
tar xf ${CUSPARSE_SOLVER_NAME}.tar.xz
sudo cp -a ${CUSPARSE_SOLVER_NAME}/include/* /usr/local/cuda/include/
sudo cp -a ${CUSPARSE_SOLVER_NAME}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudss
sudo ldconfig
ls /usr/local/cuda/lib64 | grep cudss

Huggingface Framework for Inferencing VLM on Jetson Nano

Huggingface is one of the most popular open-source ecosystems for machine learning. It provides thousands of pre-trained models including VLMs. The models can be easily downloaded and run with just a few lines of code. It is pythonic, and provides finer control for experimenting on models including quantization to increase speed and reduce memory usage.

3.1 Install Core Libraries ans Authenticate

pip install transformers accelerate huggingface_hub

Install Huggingface and core libraries using the commands below. You will also have to authenticate to download some gated models from the hub using command hf_auth login. It will promt as shown below where you have to enter a token. Initially, follow the link to genrate a new token.

hugging face login interface required to run vlm on jetson nano

We will load the models with fp16 precision or whatever version is hosted by default. The list of models that we will use for test run are as follows:

Moondream2
LFM2-VL-450M from LiquidAI
LFM2-VL-1.6B from LiquidAI
FastVLM-1.5B from Apple
FastVLM-500M from Apple
SmolVLM2-2.2B-Instruct from Huggingface

3.2 Install BitsAndBytes Library

Hugging Face Transformers integrates BitsandBytes so you can load large LLMs in 4-bit/8-bit precision with a single flag (load_in_4bit=True or load_in_8bit=True). It helps reduce GPU memory requirements by 2x to 4x, enabling you to run larger models. The quantization can improve inference/training speed depending on hardware and model.

Note: Sometime quantization makes models slower on CPU.

You can not install bitsandbytes directly from PyPi on Jetson Orin Nano because it is not compiled for aarch64. You can compile from the source but the best way is to use the wheels from Jetson Labs.

wget https://pypi.jetson-ai-lab.io/jp6/cu126/+f/d46/6b5819e312dd5/bitsandbytes-0.48.0.dev0-cp310-cp310-linux_aarch64.whl#sha256=d466b5819e312dd5fb7fa4226a074e2d90baed93d479897f7941a32a2a729e12 -O bitsandbytes-0.48.0.dev0-cp310-cp310-linux_aarch64.whl

pip install bitsandbytes-0.48.0.dev0-cp310-cp310-linux_aarch64.whl

Moondream2 VLM on Jetson Inferencing

Moondream2 is a compact, open-source vision-language model crafted by Vikhyat Korrapati (github username vikhyatk). What makes it impressive is its efficiency. With just 1.86B parameters, Moondream2 balances performance and resource usage. Tasks supported by the model are as follows.

Image captioning
Visual question answering (VQA)
Zero-shot object detection
Pointing
It also features OCR and Gaze detection ability but it is done through VQA process only.

We will be using the following images to test tiny VLM on jetson orin nano board. There is a person sitting in a car, one generic image of flower and a bird, pothole image, another image where a person is about to fall by tripping on a cable, and a text rich image.

4.1 Import Dependencies

import time
from PIL import Image, ImageDraw

# Import mattplotlib to plot image outputs
import matplotlib.pyplot as plt 
%matplotlib inline
plt.rcParams['image.cmap'] = 'gray'

from transformers import AutoModelForCausalLM, AutoTokenizer

4.2 Load Moondream2 Model

model = AutoModelForCausalLM.from_pretrained(
    "moondream/moondream-2b-2025-04-14",
    revision="2025-06-21",
    trust_remote_code=True,
    device_map="auto", 
)

dtype = next(model.parameters()).dtype
print(dtype)

4.3 Normal Caption using Moondream2

You can also perform short, and long captioning by changing the length parameter.

print('Normal caption:')
t1 = time.time()
normal_caption = model.caption(img, length="normal")["caption"]

for t in normal_caption:
print(t, end="", flush=True)
t2 = time.time()
diff = t2 - t1
print(f"\Caption Time : {round(diff,2)}")

Normal caption:
 A hummingbird with a green and gray body and a long, slender beak is captured in mid-flight, hovering near a vibrant red flower. The flower, with its numerous small, orange petals, is attached to a green stem. The background is a soft, out-of-focus yellowish-beige, providing a contrast to the vivid colors of the flower and hummingbird. Another red flower is partially visible in the bottom left corner of the image.\Caption Time : 8.14

4.4 VQA using MoonDream – Example 1

# Visual Querying
qimg = Image.open('../tasks/potholes.png')
print("\nVisual query: 'How many potholes are there in the image?'")
print(model.query(qimg, "How many potholes are there in the image?")["answer"])

Visual query: 'How many potholes are there in the image?'
 There is one pothole in the image.

4.5 VQA using MoonDream – Example 2

# Visual Querying
qimg = Image.open('../tasks/cable-trip.jpg')
print("\nVisual query: 'Why is the person falling?'")
print(model.query(qimg, "Why is the person falling?")["answer"])

Visual query: 'Why is the person falling?'
 The person is falling because they have tripped over a yellow extension cord on the floor. The cord is tangled and lies on the ground, causing the person to lose their balance and fall. This incident highlights the importance of being cautious and aware of one's surroundings, especially in industrial or commercial environments where electrical cords and equipment are present.

4.6 Object Detection using Moondream2 VLM on Jetson

# Object Detection
imgf = Image.open('../tasks/driving-gaze.jpg')
print("\nObject detection: 'face'")
objects = model.detect(imgf, "face")["objects"]
print(f"Found {len(objects)} face(s)")
w, h = imgf.size

Object detection: 'face'
Found 1 face(s)

# Create draw object
draw = ImageDraw.Draw(imgf)
# Loop over bboxes
for bbox in objects:
    # Convert normalized to pixel coords
    x_min = int(bbox['x_min'] * w)
    y_min = int(bbox['y_min'] * h)
    x_max = int(bbox['x_max'] * w)
    y_max = int(bbox['y_max'] * h)

    # Draw rectangle (outline only)
    draw.rectangle([x_min, y_min, x_max, y_max], outline="green", width=3)

    # Optionally add text
    draw.text((x_min, y_min - 15), "Face", fill="green")

plt.figure(figsize = [20, 8])
plt.subplot(121); plt.imshow(imgf); plt.title('Face Detected')

face detected using monndream2 vlm on jetson nano

The notebooks for LiquidAI, FastVLM, and SmolVLM have been included in the download code section. We will show one example of caption for each.

LFM2-VL LiquidAI VLM on Jetson Inferencing

Liquid Foundation Models (LFMs) are developed by the company LiquidAI. A novel model class built on dynamics, signal processing, and numerical linear algebra instead of traditional transformers. Released in mid-2025, LFM2-VL extends their LFM2 text-only models into the vision-language domain. There are two main model variants – LFM2-VL-450M and LFM2-VL-1.6B. Tasks supported by the model are as follows.

Image captioning
Visual question answering (VQA)
Grounding supported through VQA, not native

Let’s take a look at the code to see how model is loaded and caption inference is executed.

5.1 Import Dependencies and Load LFM2-VL-1.6B

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

# Load model and processor
model_id = "LiquidAI/LFM2-VL-1.6B"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

5.2 Prepare Inputs and Run Inference

# Load image and create conversation
image = Image.open("../tasks/bird.jpg")
query = "What do you see in the image? Answer in 100 words."
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": query},
        ],
    },
]

# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True).to(model.device)

t1 = time.time()
# Generate output
outputs = model.generate(**inputs, max_new_tokens=128)
t2 = time.time()

# Decode the output
gen = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(gen)

print(f"\n Generation Time : {round(t2-t1, 2)} s")

user
What do you see in the image? Answer in 100 words.
assistant
The image showcases a hummingbird in flight, hovering near a vibrant flower. The hummingbird's wings are blurred, capturing its rapid movement. The flower is striking, with a green stem and a mix of orange and yellow petals. The background is a soft blur of yellow and green, creating a warm, natural setting. This scene beautifully illustrates the symbiotic relationship between hummingbirds and flowers, highlighting the intricate details of both creatures in their natural habitat.

 Generation Time : 7.95 s

Inference Using FastVLM Family from Apple

FastVLM is Apple’s groundbreaking vision-language model introduced at CVPR 2025, with an official blog post on July 23, 2025. It stands out because of the following factors.

Its FastViTHD hybrid encoder, enabling fewer image tokens and extremely fast processing.
Tremendously reduced TTFT (up to 85× faster vs. prior models).
Strong performance across various VLM benchmarks while remaining small and efficient.
Smooth on-device capability, particularly on Apple Silicon, preserving user privacy and enabling offline use.
A working browser demo that shows the model’s real-time captioning capabilities with the 0.5B variant.

It has three primary variants – 0.5B, 1.5B, and 7B models. The optimized formats such as fp16, int4, and int8 versions are also available on huggingface. Tasks supported in by the model are as follows.

Image captioning
Visual question answering (VQA)
Grounding supported through VQA, not native

6.1 Load FastVLM-1.5B

from transformers import AutoTokenizer, AutoModelForCausalLM
path = "apple/FastVLM-1.5B"
IMAGE_TOKEN_INDEX = -200  # what the model code looks for
# Load
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    path,
    dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

6.2 Pre-Processing Steps

messages = [
    {"role": "user", "content": "<image>\nDescribe this image in detail."}
]
rendered = tok.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)
pre, post = rendered.split("<image>", 1)

# Tokenize the text *around* the image token (no extra specials!)
pre_ids  = tok(pre,  return_tensors="pt", add_special_tokens=False).input_ids
post_ids = tok(post, return_tensors="pt", add_special_tokens=False).input_ids

# Splice in the IMAGE token id (-200) at the placeholder position
img_tok = torch.tensor([[IMAGE_TOKEN_INDEX]], dtype=pre_ids.dtype)
input_ids = torch.cat([pre_ids, img_tok, post_ids], dim=1).to(model.device)
attention_mask = torch.ones_like(input_ids, device=model.device)

# Preprocess image via the model's own processor
img = Image.open("../tasks/bird.jpg").convert("RGB")
px = model.get_vision_tower().image_processor(images=img, return_tensors="pt")["pixel_values"]
px = px.to(model.device, dtype=model.dtype)

6.3 Generate Result – Inference

# Generate
t1 = time.time()
with torch.no_grad():
    out = model.generate(
        inputs=input_ids,
        attention_mask=attention_mask,
        images=px,
        max_new_tokens=128,
    )
t2 = time.time()
print(tok.decode(out[0], skip_special_tokens=True))

print(f"Generation Time: {round(t2-t1,2)}")

A vibrant outdoor photograph captures a striking scene dominated by a bright red spike of flowers at the top of the image, contrasting sharply against a yellowish-white background. The focal point of the right side of the image is a blue-tinted hummingbird mid-flight against this backdrop. The bird, with its distinct black eye and long black beak, gracefully hovers directly in front of the flower stalk, as if it’s about to land on the blooms. Its body is a beautiful mosaic of colors, with greenish speckles on the side, and gray streaks along its body.

The flower cluster itself is long and cylindrical,
Generation Time: 13.86

SmolVLM on Jetson Nano

SmolVLM family is from the creators of Huggingface. In 2024, Jan SmolVLM-2B models were released. Next SmolVLM-256M and SmolVLM-500M in Jan, 2025. These are the smallest VLM in the world as of today. The video capable models were released in Feb, 2025. Similar to above models, SmolVLM family also supports most tasks through VQA interface without any native functions.

Image captioning
Visual question answering (VQA)
Grounding supported through VQA, not native

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

model_path = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForImageTextToText.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
).to("cuda")

image = Image.open("../tasks/bird.jpg")

query = "Can you describe this image?"
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "path": "../tasks/bird.jpg"},
            {"type": "text", "text": query },
        ]
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
generated_texts = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
)
print(generated_texts[0])

User:
Can you describe this image?

Assistant: The image depicts a hummingbird in mid-flight, hovering near a flower. The hummingbird is captured in a dynamic pose, with its wings spread wide and its long, thin beak extended towards the flower. The flower itself is an Aloe polyphylla, characterized by its tall, slender stem and a cluster of

VLM on Jetson Nano Conclusion

Setting up VLM in a resource contrained environment is like trying to fit an elephant inside a fridge. Still the performance on Jetson Orin Nano is decent. The speed is also somewhat comparable to consumer GPU like RTX 3060 12 GB.

So that’s all about setting up a pipeline to run VLM on Jetson Nano. I hope you enjoyed reading the post. With the right setup and Hugging Face support, models like Moondream2, LiquidAI’s LFM2-VL, Apple’s FastVLM, and Huggingface’s SmolVLM can run decent on this compact board. Each offering different strengths in speed, size, and capability. The nano is a promising, capable platform for real-time, private, and low-power AI applications.

Was This Article Helpful?

What Makes DeepSeek OCR So Powerful?

DeepSeek AI just unleashed DeepSeek OCR, a 3B-param beast that compresses entire documents into 100

2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction

Discover how 2D Gaussian Splatting transforms neural rendering by replacing volumetric 3D Gaussians with surface-aligned

TRM: Tiny AI Models beating Giants on Complex Puzzles

Models with billions, or trillions, of parameters are becoming the norm. These models can write

Was This Article Helpful?

FastVLM Apple, LiquidAI, moondream2, SmolVLM, Vision Language Models, vlm on jetson nano

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

AI Agent in Action: Automating Desktop Tasks with VLMs

Agentic AIGUIVLMs

Kukil September 30, 2025

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from…

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Computer VisionVLMs

Bhomik Sharma September 23, 2025

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.

Getting Started with VLM on Jetson Nano

Why VLM on Jetson Nano?

How to Setup Jetson Nano for Running VLM?

2.1 Flash Jetson Orin Nano with Jetpack

2.2 Install and Verify Jetson Essentials

2.3 Install BLAS + Set CUDA Version

2.4 Install Sparse Matrix Library (cuSPARSELt)

2.5 Install PyTorch, TorchAudio, TorchVision (Jetson-Compatible Wheels)

2.6 Install cuDSS (CUDA Sparse Solver)

Huggingface Framework for Inferencing VLM on Jetson Nano

3.1 Install Core Libraries ans Authenticate

3.2 Install BitsAndBytes Library

Moondream2 VLM on Jetson Inferencing

4.1 Import Dependencies

4.2 Load Moondream2 Model

4.3 Normal Caption using Moondream2

4.4 VQA using MoonDream – Example 1

4.5 VQA using MoonDream – Example 2

4.6 Object Detection using Moondream2 VLM on Jetson

LFM2-VL LiquidAI VLM on Jetson Inferencing

5.1 Import Dependencies and Load LFM2-VL-1.6B

5.2 Prepare Inputs and Run Inference

Inference Using FastVLM Family from Apple

6.1 Load FastVLM-1.5B

6.2 Pre-Processing Steps

6.3 Generate Result – Inference

SmolVLM on Jetson Nano

VLM on Jetson Nano Conclusion

What Makes DeepSeek OCR So Powerful?

2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction

TRM: Tiny AI Models beating Giants on Complex Puzzles

Table of Contents

Read Next

VideoRAG: Redefining Long-Context Video Comprehension

AI Agent in Action: Automating Desktop Tasks with VLMs

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Subscribe to our Newsletter

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

Get Started with OpenCV

Getting Started with VLM on Jetson Nano

Why VLM on Jetson Nano?

How to Setup Jetson Nano for Running VLM?

2.1 Flash Jetson Orin Nano with Jetpack

2.2 Install and Verify Jetson Essentials

2.3 Install BLAS + Set CUDA Version

2.4 Install Sparse Matrix Library (cuSPARSELt)

2.5 Install PyTorch, TorchAudio, TorchVision (Jetson-Compatible Wheels)

2.6 Install cuDSS (CUDA Sparse Solver)

Huggingface Framework for Inferencing VLM on Jetson Nano

3.1 Install Core Libraries ans Authenticate

3.2 Install BitsAndBytes Library

Moondream2 VLM on Jetson Inferencing

4.1 Import Dependencies

4.2 Load Moondream2 Model

4.3 Normal Caption using Moondream2

4.4 VQA using MoonDream – Example 1

4.5 VQA using MoonDream – Example 2

4.6 Object Detection using Moondream2 VLM on Jetson

LFM2-VL LiquidAI VLM on Jetson Inferencing

5.1 Import Dependencies and Load LFM2-VL-1.6B

5.2 Prepare Inputs and Run Inference

Inference Using FastVLM Family from Apple

6.1 Load FastVLM-1.5B

6.2 Pre-Processing Steps

6.3 Generate Result – Inference

SmolVLM on Jetson Nano

VLM on Jetson Nano Conclusion

Subscribe & Download Code

What Makes DeepSeek OCR So Powerful?

2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction

TRM: Tiny AI Models beating Giants on Complex Puzzles

Table of Contents

Read Next

VideoRAG: Redefining Long-Context Video Comprehension

AI Agent in Action: Automating Desktop Tasks with VLMs

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Subscribe to our Newsletter

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

Get Started with OpenCV