PyTorch

FineTuning Gemma 3n for Medical VQA on ROCOv2

What if a radiologist facing a complex scan in the middle of the night could ask an AI assistant for a second opinion, right from their local workstation? This isn't

Computer Vision, Generative AI, Generative Models, LLMs, Multimodal Models, NLP, Transformer Neural Networks, Vision Language Models, Vision Transformer, VLMs

Optimizing VJEPA-2: Tackling Latency & Context in Real-Time Video Classification Scripts

The domain of video understanding is rapidly evolving, with models capable of interpreting complex actions and interactions within video streams. Meta AI’s VJEPA-2 (Video Joint Embedding Predictive Architecture) stands out

Generative AI, video classification, Vision Language Models

V-JEPA 2: Meta’s Breakthrough in AI for the Physical World

The ultimate goal for many in artificial intelligence is to build agents that can perceive, reason, and act in our complex physical world. Meta AI has made a significant stride

Computer Vision, Generative AI, Generative Models, Hugging Face Transformers, Multimodal Models, Robotics, Vision Language Models

Distributed Parallel Training: PyTorch Multi-GPU Setup in Kaggle T4x2

Training large models on a single GPU is limited by memory constraints. Distributed training enables scalable training across multiple GPUs.

GPUs, PyTorch, Training Neural Networks

Model Weights File Formats in Machine Learning

As Machine Learning and AI technologies continue to advance, the need for efficient and secure methods to store, share, and deploy trained models becomes increasingly critical. Model weights file formats

Deployment, Machine Learning

DINOv2 by Meta: A Self-Supervised foundational vision model

The field of computer vision is fueled by the remarkable progress in self-supervised learning. At the forefront of this revolution is DINOv2, a cutting-edge self-supervised vision transformer developed by Meta

Computer Vision, Self-Supervised Learning

FineTuning SAM2 for Leaf Disease Segmentation – Step-by-Step Tutorial

Leaf diseases reduce crop yields and impact food security. Finetuning SAM2 helps detect and segment diseased areas using deep learning. With a small dataset, we achieved 74% IoU, making early

Computer Vision, Deep Learning, Image Segmentation

Image Captioning using ResNet and LSTM

Image Captioning using ResNet and LSTM bridges vision and language, enabling machines to "see" images and "describe" them in text. This model powers applications like accessibility for visually impaired users,

Computer Vision, Deep Learning, NLP

DETR: Overview and Inference

This blog goes through the architecture of DETR

Computer Vision, Object Detection, PyTorch

Sapiens: Foundation for Human Vision Models by Meta

The article primarily discusses capabilities Sapiens a foundational human vision model by meta, achieves state-of-the-art performance in tasks like 2D pose estimation, body-part segmentation, normal and depth estimation.

3D Computer Vision, Computer Vision, Deep Learning, Generative AI, SpatialAI-Depth

Fine-tuning Faster R-CNN on Sea Rescue Dataset – Small Object Detection: PyTorch

This research article discusses about how data preparation matters for Fine-tuning Faster R-CNN on aerial small object detection.

Computer Vision, Deep Learning, Object Detection

YOLO Loss Function Part 2: GFL and VFL Loss

In the preceding article, YOLO Loss Functions Part 1, we focused exclusively on SIoU and Focal Loss as the primary loss functions used in the YOLO series of models. In

Computer Vision, Deep Learning, Focal Loss, GFL, Loss Function, Object Detection, SIoU Loss Functions, VFL, YOLO