News

Object Insertion in Gaussian Splatting: Paper Explanation and Training of MCMC in Gsplat

3D Gaussian splatting (3DGS) has recently gained recognition as a groundbreaking approach in radiance fields and computer graphics. It stands out as a jack of all trades, addressing challenges that

3D Computer Graphics, 3D Computer Vision, 3D Reconstruction, Computer Vision, Robotics, SLAM

Depth Pro: The Sharp Monocular Metric Depth Estimation from Apple Explanation and Applications

Apple's DepthPro is quite impressive, producing pixel-perfect, high-resolution metric depth maps with sharp boundaries through monocular depth estimation. It outperforms all of its contenders like Metric3D v2 and DepthAnything in

3D Computer Vision, Computer Vision, Deep Learning, SpatialAI-Depth

Fine-tuning Stable Diffusion 3.5: UI images

Recently, the interest in fine-tuning Stable Diffusion models has surged among AI enthusiasts and professionals, driven by the need to incorporate these models into specific requirements. This article walks you

AI Art Generation, Deep Learning, Diffusion Models, Generative AI, Generative Models, Neural Attention, PyTorch, UI

SimSiam: Streamlining SSL with Stop-Gradient Mechanism

SimSiam simplifies Self-Supervised Learning by eliminating the need for negative samples and momentum encoders. Using a dual-branch Siamese network and a stop-gradient mechanism, it prevents representation collapse while achieving competitive

Contrastive Learning, Self-Supervised Learning

Image Captioning using ResNet and LSTM

Image Captioning using ResNet and LSTM bridges vision and language, enabling machines to "see" images and "describe" them in text. This model powers applications like accessibility for visually impaired users,

Computer Vision, Deep Learning, NLP

Molmo VLM AI : Paper Explanation and Demo Applications – AllenAI (Ai2)

Molmo VLM is an open-source Vision-Language Model (VLM) showcasing exceptional capabilities in tasks like pointing, counting, VQA, and clock face recognition. Leveraging the meticulously curated PixMo dataset and a well-optimized

Computer Vision, LLMs, Segmentation, Vision Language Models

3D Gaussian Splatting Introduction – Paper Explanation & Training on Custom Datasets with NeRF Studio Gsplats

3D Gaussian Splatting (3DGS) is redefining the landscape of 3D computer graphics and vision — but here’s a catch: it achieves groundbreaking results without relying on any neural networks, not

3D Computer Graphics, 3D Computer Vision, 3D Reconstruction, Robotics, SLAM

FLUX AI Image Generation: Experimenting with the Parameters

Image generation has become a fascinating field in AI, offering tools to create astounding visuals with minimal effort. Flux AI image generation model, an open-source model developed by Black Forest

AI Art Generation, Diffusion Models, Generative AI, Generative Models

Contrastive Learning – SimCLR and BYOL (With Code Example)

Supervised Learning has been dominant for years, but its reliance on labeled data—a costly and time-consuming resource—creates challenges, especially in areas like medical imaging. On the other hand, Unsupervised Learning,

Computer Vision, Contrastive Learning, Deep Learning, Self-Supervised Learning

The Annotated NeRF – Training on Custom Dataset from Scratch in Pytorch

In recent years, the field of 3D from multi-view has become one of the most popular topics in computer vision conferences, with a high number of submitted papers each year.

3D Computer Graphics, 3D Computer Vision, 3D Reconstruction, Deep Learning, PyTorch, Robotics, SLAM

Stable Diffusion 3.5: Paper Explanation and Inference

Stable Diffusion 3.5, released on June 2024 by Stability AI, is the third iteration in the Stable Diffusion family. The Turbo-Large and Large variants of the SD3.5 family are Stability

AI Art Generation, Diffusion Models, Generative AI, Hugging Face Transformers

LightRAG: Simple and Fast Alternative to GraphRAG for Legal Doc Analysis

This article discusses the architecture of LightRAG from HKU, exploring its in-depth internal workings and comparing it with GraphRAG and NaiveRAG for local document analysis.

Deep Learning, Generative AI, LLMs, RAGs