YOLO11 is finally here, revealed at the exciting Ultralytics YOLO Vision 2024 (YV24) event. 2024 is a year of YOLO models. After the release of YOLOv8 in 2023, we got YOLOv9 and YOLOv10 this year, and ...
Exploring DINO: Self-Supervised Transformers for Road Segmentation with ResNet50 and U-Net
DINO is a self-supervised learning (SSL) framework that uses the Vision Transformer (ViT) as it's core architecture. While SSL initially gained popularity through its use in natural language ...
Sapiens: Foundation for Human Vision Models by Meta
Sapiens, a family of foundational Human Vision Models by Rawal et al., from Meta, achieves state-of-the-art results for human centric tasks like 2D pose estimation, body-part segmentation, depth ...
ColPali: Enhancing Financial Report Analysis with Multimodal RAG and Gemini
ColPali multimodal RAG offers a novel approach for efficient retrieval of elements such as images, tables, charts, and texts by treating each page as an image. This method takes advantage of Vision ...
Training CLIP Model from Scratch for an Image Retrieval App
Contrastive Language Image Pretraining (CLIP) by OpenAI is a model that connects text and images, allowing it to recognize and categorize images without needing specific training for each category. ...
Introduction to LiDAR SLAM: LOAM and LeGO-LOAM Paper and Code Explanation with ROS 2 Implementation
LiDAR SLAM is a crucial component in robotics perception, widely used in both industry and academia for its efficiency and robustness in localization and mapping. In robotics perception research, ...