Generative AI

Vision Language Action Models (VLA) Overview: LeRobot Policies Demo

The advent of Generative AI, has fundamentally transformed robotic intelligence, enabling significant strides in how advanced humanoid robots “perceive, reason and act” in the physical world. This huge progress is

Generative AI, Robotics, Vision Language Models

Fine-Tuning Gemma 3 VLM using QLoRA for LaTeX-OCR Dataset

Fine-Tuning Gemma 3 allows us to adapt this advanced model to specific tasks, optimizing its performance for domain-specific applications. By leveraging QLoRA (Quantized Low-Rank Adaptation) and Transformers, we can efficiently

Computer Vision, Generative Models, LLMs, Vision Language Models

Diving into the Nodes: An Introduction to ComfyUI for Stable Diffusion

ComfyUI – a powerful, node-based graphical user interface (GUI) that offers flexibility and transparency when working with stable diffusion models. This article provides an introduction to ComfyUI, covering installation and

AI Art Generation, Computer Vision, Diffusion Models, Generative AI

Introduction to GPT-4o Image Generation – Here’s What You Need to Know

GPT-4o image generation is a game-changer! With native support in ChatGPT, you can now create stunning visuals from text prompts, refine them, and explore styles like Studio Ghibli or photorealism.

AI Art Generation, Computer Vision, Deep Learning, Diffusion Models, Generative AI, Generative Models, Transformer Neural Networks

Gemma 3: A Comprehensive Introduction

Gemma 3 is the latest addition to Google’s family of open models, built from the same research and technology used to create the Gemini models. It is designed to be

Generative Models, LLMs, Vision Language Models

DDIM: The Faster, Improved Version of DDPM for Efficient AI Image Generation

Diffusion models have changed the game in image generation. Tools like Stable Diffusion have become popular for their ability to turn text into images using these models. The core idea

Computer Vision, Diffusion Models, Generative AI, Generative Models

Introduction to Model Context Protocol (MCP)

Model Context Protocol (MCP) is a new standard by Anthropic to connect LLMs with different applications via a server-client protocol.

Artificial Intelligence, Generative AI, LLMs

GraphRAG: The Practical Guide for Cost-Effective Document Analysis with Knowledge Graphs

GraphRAG is a pivotal research from Microsoft improving the shortcomings of naive RAG by employing structured Knowledge graph which includes entities, relations, claims etc, for traceability by traversing multi-hop nodes.

Generative AI, LLMs, NLP, RAGs

OmniParser: Vision Based GUI Agent

In this article, we explore OmniParser a UI screen parsing pipeline combining fine-tuned YOLO model for icon detection and Florence2 for icon recognition and icon description generation.

Agentic AI, Generative AI, OCR, Vision Language Models

Agentic AI: An Introduction to Autonomous Intelligent Systems

AI, being no longer confined to passive algorithms, is transforming itself into autonomous agents that can perceive, reason, and act with increasing intelligence. These agents are designed to navigate uncertainty,

Agentic AI, Deep Learning, Generative AI, LLMs, RAGs

Fine-tuning Stable Diffusion 3.5: UI images

Recently, the interest in fine-tuning Stable Diffusion models has surged among AI enthusiasts and professionals, driven by the need to incorporate these models into specific requirements. This article walks you

AI Art Generation, Deep Learning, Diffusion Models, Generative AI, Generative Models, Neural Attention, PyTorch, UI

Molmo VLM AI : Paper Explanation and Demo Applications – AllenAI (Ai2)

Molmo VLM is an open-source Vision-Language Model (VLM) showcasing exceptional capabilities in tasks like pointing, counting, VQA, and clock face recognition. Leveraging the meticulously curated PixMo dataset and a well-optimized

Computer Vision, LLMs, Segmentation, Vision Language Models