Tiny Vision Language Models (VLMs) are rapidly transforming the AI landscape. Almost every week, new VLMs with smaller footprints are being released. These models are finding applications across ...
Search Results for: mac os
VLM on Edge: Worth the Hype or Just a Novelty?
In 2018, Pete Warden from TensorFlow Lite said, “The future of machine learning is tiny.” Today, with AI moving towards powerful Vision Language Models (VLMs), the need for high computing power has ...
AI for Video Understanding: From Content Moderation to Summarization
The rapid growth of video content has created a need for advanced systems to process and understand this complex data. Video understanding is a critical field in AI, where the goal is to enable ...
Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL
Object Detection is predominantly a vision task where we train a vision model, like YOLO, to predict the location of the object along with its class. But still it depends on the pre-trained classes, ...
FineTuning Gemma 3n for Medical VQA on ROCOv2
The release of Gemma 3n, Google's latest family of open nano models, made LLM edge deployment more accessible. Its unique architecture is engineered to address the persistent challenges ...
Nanonets-OCR-s: Enabling Rich, Structured Markdown for Document Understanding
Traditional Optical Character Recognition (OCR) systems are primarily designed to extract plain text from scanned documents or images. While useful, such systems often ignore semantic structure, ...