Agentic AI
Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.
Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from application APIs.
Welcome back to our LangGraph series! In our previous post, we explored the fundamental concepts of LangGraph by building a Visual Web Browser Agent that could navigate, see, scroll, and summarize
Developing intelligent agents, using LLMs like GPT-4o, Gemini, etc., that can perform tasks requiring multiple steps, adapt to changing information, and make decisions is a core challenge in AI development.
As AI systems become more specialized, getting them to work together without endless glue code is the next big challenge. Thatβs where Googleβs A2A Protocol (Agent-to-Agent) steps inβa standardized messaging
In this article, we explore OmniParser a UI screen parsing pipeline combining fine-tuned YOLO model for icon detection and Florence2 for icon recognition and icon description generation.