Agentic AI

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

Agentic AI, LLMs, RAGs, Video Analysis, Vision Language Models

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from application APIs.

Agentic AI, GUI, VLMs

LangGraph: Building Self-Correcting RAG Agent for Code Generation

Welcome back to our LangGraph series! In our previous post, we explored the fundamental concepts of LangGraph by building a Visual Web Browser Agent that could navigate, see, scroll, and summarize

Agentic AI, AI Art Generation, Computer Vision, Generative AI, Generative Models, Hugging Face Transformers, Multimodal Models, Vision Language Models

Building an Agentic Browser with LangGraph: A Visual Automation and Summarization Pipeline

Developing intelligent agents, using LLMs like GPT-4o, Gemini, etc., that can perform tasks requiring multiple steps, adapt to changing information, and make decisions is a core challenge in AI development.

Agentic AI, Computer Vision, Generative AI, Generative Models, LLMs, VLMs

Google’s A2A Protocol: Here’s What You Need to Know

As AI systems become more specialized, getting them to work together without endless glue code is the next big challenge. That’s where Google’s A2A Protocol (Agent-to-Agent) steps in—a standardized messaging

Agentic AI, Deep Learning

OmniParser: Vision Based GUI Agent

In this article, we explore OmniParser a UI screen parsing pipeline combining fine-tuned YOLO model for icon detection and Florence2 for icon recognition and icon description generation.

Agentic AI, Generative AI, OCR, Vision Language Models

Agentic AI: An Introduction to Autonomous Intelligent Systems

AI, being no longer confined to passive algorithms, is transforming itself into autonomous agents that can perceive, reason, and act with increasing intelligence. These agents are designed to navigate uncertainty,

Agentic AI, Deep Learning, Generative AI, LLMs, RAGs