Computer Vision

Image Captioning using ResNet and LSTM

Image Captioning using ResNet and LSTM bridges vision and language enabling machines to see images and describe them in text This model powers applications like accessibility for visually impaired users

Computer Vision, Deep Learning, NLP

Molmo VLM AI : Paper Explanation and Demo Applications – AllenAI (Ai2)

Molmo VLM is an open source Vision Language Model VLM showcasing exceptional capabilities in tasks like pointing counting VQA and clock face recognition Leveraging the meticulously curated PixMo dataset and

Computer Vision, LLMs, Segmentation, Vision Language Models

3D Gaussian Splatting Introduction – Paper Explanation & Training on Custom Datasets with NeRF Studio Gsplats

3D Gaussian Splatting 3DGS is redefining the landscape of 3D computer graphics and vision but here s a twist it achieves groundbreaking results without relying on any neural networks not

3D Computer Graphics, 3D Computer Vision, 3D Reconstruction, Robotics, SLAM

Contrastive Learning – SimCLR and BYOL (With Code Example)

Supervised Learning has been dominant for years but its reliance on labeled data a costly and time consuming resource creates challenges especially in areas like medical imaging On the other

Computer Vision, Contrastive Learning, Deep Learning, Self-Supervised Learning

The Annotated NeRF – Training on Custom Dataset from Scratch in Pytorch

In recent years the field of 3D from multi view has become one of the most popular topics in computer vision conferences with a high number of submitted papers each