Computer Vision

Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL

What if object detection wasn't just about drawing boxes, but about having a conversation with an image? Dive deep into the world of Vision Language Models (VLMs) and see how

Computer Vision, LLMs, NLP, Vision Language Models, VLMs

SimLingo: Vision-Language-Action Model for Autonomous Driving

SimLingo is a remarkable model that combines autonomous driving, language understanding, and instruction-aware control—all in one unified, camera-only framework. It not only delivered top rankings on CARLA Leaderboard 2.0 and

Advanced Driver Assistance Systems, Autonomous Vehicle, Computer Vision, Robotics, VLMs

V-JEPA 2: Meta’s Breakthrough in AI for the Physical World

The ultimate goal for many in artificial intelligence is to build agents that can perceive, reason, and act in our complex physical world. Meta AI has made a significant stride

Computer Vision, Generative AI, Generative Models, Hugging Face Transformers, Multimodal Models, Robotics, Vision Language Models

DINOv2 by Meta: A Self-Supervised foundational vision model

The field of computer vision is fueled by the remarkable progress in self-supervised learning. At the forefront of this revolution is DINOv2, a cutting-edge self-supervised vision transformer developed by Meta

Computer Vision, Self-Supervised Learning

GraphRAG: The Practical Guide for Cost-Effective Document Analysis with Knowledge Graphs

GraphRAG is a pivotal research from Microsoft improving the shortcomings of naive RAG by employing structured Knowledge graph which includes entities, relations, claims etc, for traceability by traversing multi-hop nodes.

Generative AI, LLMs, NLP, RAGs

Image Captioning using ResNet and LSTM

Image Captioning using ResNet and LSTM bridges vision and language, enabling machines to "see" images and "describe" them in text. This model powers applications like accessibility for visually impaired users,

Computer Vision, Deep Learning, NLP

Training 3D U-Net for Brain Tumor Segmentation Challenge – Medical Imaging

This articles discussed Training 3D U-Net for Brain Tumor Segmentation - BraTS2023. Glioma Detection It touches upon the importance of 3D U-Net over 2D U-Net for MRI Brain Scans.

3D Computer Vision, Computer Vision, Deep Learning, Medical Imaging

DETR: Overview and Inference

This blog goes through the architecture of DETR

Computer Vision, Object Detection, PyTorch

Sapiens: Foundation for Human Vision Models by Meta

The article primarily discusses capabilities Sapiens a foundational human vision model by meta, achieves state-of-the-art performance in tasks like 2D pose estimation, body-part segmentation, normal and depth estimation.

3D Computer Vision, Computer Vision, Deep Learning, Generative AI, SpatialAI-Depth

ColPali: Enhancing Financial Report Analysis with Multimodal RAG and Gemini

Performing RAG on Unstructured elements that too in complex pdfs like finance, law reports is challenging. ColPali a novel document retrieval approach achieves SOTA results with high quality retrieval. This

Computer Vision, LLMs, RAGs, Vision Language Models

Introduction to Feature Matching Using Neural Networks

Feature matching using deep learning is a game-changer for computer vision tasks like panorama stitching, video stabilization, and face recognition, providing greater accuracy and reliability. Dive into how this technology

Computer Vision, Deep Learning, Feature Detection, Neural Network

CVPR 2024 Key Research & Dataset Papers – Part 2

This article gives an overview about the key research papers and dataset from CVPR 2024 along with repository links.

AI Research Papers, Computer Vision, Deep Learning