News
Yet another SOTA model from META, meet SAM-3. Learn about what's new and how to implement your own tracking pipeline using SAM-3.
Discover Image-GS, an image representation framework based on adaptive 2D Gaussians, outperforming neural and classical codecs in terms of real-time efficiency.
vLLM Paper Explained. Understand how pagedAttention, and continuous batching works along with other optimizations by vLLM over time.
Processing long documents with VLMs or LLMs poses a fundamental challenge: input size exceeds context limits. Even with GPUs, as large as 12 GB can barely process 3-4 pages at
DeepSeek OCR Paper Explanation and Test using Transformers and vLLM Pipeline. Understanding Context Optical Compression and model architecture in depth.
Discover how 2D Gaussian Splatting transforms neural rendering by replacing volumetric 3D Gaussians with surface-aligned 2D disks.
Models with billions, or trillions, of parameters are becoming the norm. These models can write essays, generate code, as well as create art. But they can still get stuck on
Deploying ML on Arduino Nano 33 BLE. Explore TinyML techniques, setup steps, and why older Arduinos still rival the new Arduino Uno Q.
Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.
Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from application APIs.
Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.
Learn how to setup a pipeline to run VLM on Jetson Nano using Huggingface Transformers. Run models like LiquidAI, Moondream2, FastVLM, and SmolVLM.