Object Detection is predominantly a vision task where we train a vision model, like YOLO, to predict the location of the object along with its class. But still it depends on the pre-trained classes, ...
FineTuning Gemma 3n for Medical VQA on ROCOv2
The release of Gemma 3n, Google's latest family of open nano models, made LLM edge deployment more accessible. Its unique architecture is engineered to address the persistent challenges ...
MedGemma: Google’s Medico VLM for Clinical QA, Imaging, and More
Picture this: Dr. Aris, a radiologist with a decade of experience, is staring at his screen. The stack of digital files, chest X-rays, and CT scans seems endless. Each image holds a story, a clue to a ...
SmolVLA: Affordable & Efficient VLA Robotics on Consumer GPUs
Imagine you're a robotics enthusiast, a student, or even a seasoned developer, and you've been captivated by the idea of robots that can see, understand our language, and then act on that ...
Vision Language Action Models (VLA) Overview: LeRobot Policies Demo
The advent of Generative AI, has fundamentally transformed robotic intelligence, enabling significant strides in how advanced humanoid robots "perceive, reason and act" in the physical world. This ...