The rapid evolution of artificial intelligence, particularly large language models (LLMs), has unlocked unprecedented potential for generating human-like text, solving complex problems, and enhancing ...
OmniParser: Vision Based GUI Agent
The rapid advancement of Vision-Language Models (VLMs) has significantly improved the ability of AI systems to interact with graphical user interfaces (GUIs). However, existing models often struggle ...
NVIDIA AI Summit 2024 – India Overview
The NVIDIA AI Summit 2024, held from October 23 to 25 at the Jio World Convention Centre in Mumbai, marked a significant milestone in India's journey toward becoming a global leader in artificial ...
Handwritten Text Recognition using OCR
Handwritten text documents are ubiquitous in the field of research and study. They are personalized to the user’s needs and often contain a style of writing difficult to comprehend by others. This ...
Fine Tuning Whisper on Custom Dataset
Whisper is a leading open-source model used for converting speech to text. Developed by OpenAI, Whisper has been trained on a diverse array of languages and speech conditions using extensive data. ...
SAM 2 – Promptable Segmentation for Images and Videos
Image segmentation is one of the most fundamental tasks in Computer Vision. With their Segment Anything Model (SAM), last year, Meta AI put forth the world's first foundation model for image ...