Ankan Ghosh

Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL

What if object detection wasn't just about drawing boxes, but about having a conversation with an image? Dive deep into the world of Vision Language Models (VLMs) and see how

Computer Vision, LLMs, NLP, Vision Language Models, VLMs

FineTuning Gemma 3n for Medical VQA on ROCOv2

What if a radiologist facing a complex scan in the middle of the night could ask an AI assistant for a second opinion, right from their local workstation? This isn't

Computer Vision, Generative AI, Generative Models, LLMs, Multimodal Models, NLP, Transformer Neural Networks, Vision Language Models, Vision Transformer, VLMs

MedGemma: Google’s Medico VLM for Clinical QA, Imaging, and More

Imagine an AI co-pilot for every clinician, capable of understanding both complex medical images and dense clinical text. That's the promise of MedGemma, Google's new Vision-Language Model specifically trained for

Generative AI, LLMs, Vision Language Models, VLMs

GR00T N1.5 Explained: NVIDIA’s VLA Model for Humanoids

Dive into NVIDIA's GR00T N1.5, a groundbreaking open foundation model poised to revolutionize humanoid robotics! Discover how this advanced Vision-Language-Action (VLA) model, with its smarter architecture and innovative training using

Robotics, Vision Language Models, Vision Transformer

SmolVLA: Affordable & Efficient VLA Robotics on Consumer GPUs

Imagine you’re a robotics enthusiast, a student, or even a seasoned developer, and you’ve been captivated by the idea of robots that can see, understand our language, and then act on that

Robotics, Vision Language Models, Vision Transformer

Getting Started with Qwen3 – The Thinking Expert

Discover Qwen3, Alibaba’s open-source thinking LLM. Switch between fast replies and chain-of-thought reasoning with 128 K context, and MoE efficiency. Learn how to use and Fine Tune.

Generative AI, Language Models, LLMs, NLP

MedSAM2 Explained: One Prompt to Segment Anything in Medical Imaging

MedSAM2 brings “segment anything” power to healthcare, carving organs, tumours, and even moving heart chambers from CT, MRI, PET, and live ultrasound with a single prompt. Running in < 1

3D Computer Vision, Computer Vision, Image Segmentation

Google’s A2A Protocol: Here’s What You Need to Know

As AI systems become more specialized, getting them to work together without endless glue code is the next big challenge. That’s where Google’s A2A Protocol (Agent-to-Agent) steps in—a standardized messaging

Agentic AI, Deep Learning

Introduction to GPT-4o Image Generation – Here’s What You Need to Know

GPT-4o image generation is a game-changer! With native support in ChatGPT, you can now create stunning visuals from text prompts, refine them, and explore styles like Studio Ghibli or photorealism.

AI Art Generation, Computer Vision, Deep Learning, Diffusion Models, Generative AI, Generative Models, Transformer Neural Networks

YOLO11 on Raspberry Pi: Optimizing Object Detection for Edge Devices

Imagine you have multiple warehouses in different places where you don’t have time to monitor everything at a time, and you can’t afford a lot of computes due to their

Computer Vision, Edge Devices, Object Detection, Object Tracking, Raspberry Pi, YOLO

FineTuning RetinaNet for Wildlife Detection with PyTorch: A Step-by-Step Tutorial

A comprehensive step-by-step guide on fine-tuning RetinaNet using PyTorch to achieve 79% accuracy on wildlife detection tasks. In this tutorial, we dive deep into RetinaNet’s architecture, explain the benefits of

Computer Vision, Deep Learning, Object Detection

FineTuning SAM2 for Leaf Disease Segmentation – Step-by-Step Tutorial

Leaf diseases reduce crop yields and impact food security. Finetuning SAM2 helps detect and segment diseased areas using deep learning. With a small dataset, we achieved 74% IoU, making early

Computer Vision, Deep Learning, Image Segmentation