Computer Vision | LearnOpenCV

YOLO11: Faster Than You Can Imagine!

October 8, 2024 By 1 Comment

YOLO11 is finally here, revealed at the exciting Ultralytics YOLO Vision 2024 (YV24) event. 2024 is a year of YOLO models. After the release of YOLOv8 in 2023, we got YOLOv9 and YOLOv10 this year, and ...

Exploring DINO: Self-Supervised Transformers for Road Segmentation with ResNet50 and U-Net

October 1, 2024 By 7 Comments

DINO is a self-supervised learning (SSL) framework that uses the Vision Transformer (ViT) as it's core architecture. While SSL initially gained popularity through its use in natural language ...

Sapiens: Foundation for Human Vision Models by Meta

September 24, 2024 By Leave a Comment

Sapiens, a family of foundational Human Vision Models by Rawal et al., from Meta, achieves state-of-the-art results for human centric tasks like 2D pose estimation, body-part segmentation, depth ...

ColPali: Enhancing Financial Report Analysis with Multimodal RAG and Gemini

September 17, 2024 By Leave a Comment

ColPali multimodal RAG offers a novel approach for efficient retrieval of elements such as images, tables, charts, and texts by treating each page as an image. This method takes advantage of Vision ...

Training CLIP Model from Scratch for an Image Retrieval App

August 27, 2024 By 1 Comment

Contrastive Language Image Pretraining (CLIP) by OpenAI is a model that connects text and images, allowing it to recognize and categorize images without needing specific training for each category. ...

Introduction to LiDAR SLAM: LOAM and LeGO-LOAM Paper and Code Explanation with ROS 2 Implementation

August 20, 2024 By 10 Comments

LiDAR SLAM is a crucial component in robotics perception, widely used in both industry and academia for its efficiency and robustness in localization and mapping. In robotics perception research, ...

YOLO11: Faster Than You Can Imagine!

Exploring DINO: Self-Supervised Transformers for Road Segmentation with ResNet50 and U-Net

Sapiens: Foundation for Human Vision Models by Meta

ColPali: Enhancing Financial Report Analysis with Multimodal RAG and Gemini

Training CLIP Model from Scratch for an Image Retrieval App

Introduction to LiDAR SLAM: LOAM and LeGO-LOAM Paper and Code Explanation with ROS 2 Implementation

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?