Tiny Vision Language Models (VLMs) are rapidly transforming the AI landscape. Almost every week, new VLMs with smaller footprints are being released. These models are finding applications across ...
VLM on Edge: Worth the Hype or Just a Novelty?
In 2018, Pete Warden from TensorFlow Lite said, “The future of machine learning is tiny.” Today, with AI moving towards powerful Vision Language Models (VLMs), the need for high computing power has ...
Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL
Object Detection is predominantly a vision task where we train a vision model, like YOLO, to predict the location of the object along with its class. But still it depends on the pre-trained classes, ...
SimLingo: Vision-Language-Action Model for Autonomous Driving
SimLingo is a remarkable model that combines autonomous driving, language understanding, and instruction-aware control—all in one unified, camera-only framework. It not only delivered top rankings on ...
Fine-Tuning Gemma 3 VLM using QLoRA for LaTeX-OCR Dataset
Fine-Tuning Gemma 3 allows us to adapt this advanced model to specific tasks, optimizing its performance for domain-specific applications. By leveraging QLoRA (Quantized Low-Rank Adaptation) and ...
Gemma 3: A Comprehensive Introduction
Gemma 3 is the latest addition to Google's family of open models, built from the same research and technology used to create the Gemini models. It is designed to be lightweight yet powerful, enabling ...