News

What Makes DeepSeek OCR So Powerful?

DeepSeek OCR Paper Explanation and Test using Transformers and vLLM Pipeline. Understanding Context Optical Compression and model architecture in depth.

Generative AI, OCR, Text Recognition

2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction

Discover how 2D Gaussian Splatting transforms neural rendering by replacing volumetric 3D Gaussians with surface-aligned 2D disks.

3D Computer Graphics, 3D Computer Vision, 3D Reconstruction

TRM: Tiny AI Models beating Giants on Complex Puzzles

Models with billions, or trillions, of parameters are becoming the norm. These models can write essays, generate code, as well as create art. But they can still get stuck on

Neural Attention, Transformer Neural Networks

Deploying ML on Arduino: From Blink to Think

Deploying ML on Arduino Nano 33 BLE. Explore TinyML techniques, setup steps, and why older Arduinos still rival the new Arduino Uno Q.

Deployment, Edge Devices, Image Classification, Machine Learning

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

Agentic AI, LLMs, RAGs, Video Analysis, Vision Language Models

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from application APIs.

Agentic AI, GUI, VLMs

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Computer Vision, VLMs

Getting Started with VLM on Jetson Nano

Learn how to setup a pipeline to run VLM on Jetson Nano using Huggingface Transformers. Run models like LiquidAI, Moondream2, FastVLM, and SmolVLM.

Edge Devices, Jetson Nano, VLM on Jetson Nano, VLMs

VLM on Edge: Worth the Hype or Just a Novelty?

Testing Vision Language Models (VLM) on edge devices. Check how small VLMs perform on our custom Raspberry Pi Cluster and Jetson Nanos.

Edge Devices, Jetson Nano, Raspberry Pi, VLMs

AnomalyCLIP : Harnessing CLIP for Weakly-Supervised Video Anomaly Recognition

Video Anomaly Detection (VAD) is one of the most challenging problems in computer vision. It involves identifying rare, abnormal events in videos – such as burglary, fighting, or accidents –

Anomaly Detection, Vision Transformer, VLMs

AI for Video Understanding: From Content Moderation to Summarization

The rapid growth of video content has created a need for advanced systems to process and understand this complex data. Video understanding is a critical field in AI, where the

Computer Vision, Generative AI, Video Analysis, Vision Language Models

Video-RAG: Training-Free Retrieval for Long-Video LVLMs

Learn how Video-RAG boosts training-free and low-compute long-video understanding by pairing OCR, ASR, and open-vocabulary detection with any long-video LVLMs.

RAGs, VLMs