Sovit Rath

Introduction to Model Context Protocol (MCP)

Model Context Protocol (MCP) is a new standard by Anthropic to connect LLMs with different applications via a server-client protocol.

Artificial Intelligence, Generative AI, LLMs

OmniParser: Vision Based GUI Agent

In this article, we explore OmniParser a UI screen parsing pipeline combining fine-tuned YOLO model for icon detection and Florence2 for icon recognition and icon description generation.

Agentic AI, Generative AI, OCR, Vision Language Models

NVIDIA AI Summit 2024 – India Overview

The NVIDIA AI Summit 2024, held from October 23 to 25 at the Jio World Convention Centre in Mumbai, marked a significant milestone in India’s journey toward becoming a global

Artificial Intelligence, Deep Learning, NVIDIA

Handwritten Text Recognition using OCR

In this article, we carry out handwritten text recognition using OCR. We fine tune the TrOCR model on the GNHK dataset.

Deep Learning, Hugging Face Transformers, OCR

Fine Tuning Whisper on Custom Dataset

In this article, we fine tune the Whisper ASR model on a custom dataset to recognize Air Traffic Control audio.

Hugging Face Transformers, Speech Recognition, Training Neural Networks

SAM 2 – Promptable Segmentation for Images and Videos

In this article, we explore SAM 2 (Segment Anything Model 2), for Promptable Visual Segmentation of objects in images and videos.

Image Segmentation, Segmentation

Dreambooth using Diffusers

In this article, we using the Dreambooth technique to train Stable Diffusion 1.5 and teach it to generate images of a very specific species of cat.

AI Art Generation, Generative AI, Generative Models, Hugging Face Transformers

Introduction to Hugging Face Diffusers

In this article, we cover the Hugging Face Diffusers library for text-to-image, image-to-image, and image inpainting.

Diffusion Models, Generative AI, Generative Models, Hugging Face Transformers

Text Summarization using T5: Fine-Tuning and Building Gradio App

In this article, we do text summarization using T5 and fine-tune the model to build a Text Summarization Gradio app.

Hugging Face Transformers, NLP, Transformer Neural Networks

Fine Tuning T5: Text2Text Transfer Transformer for Building a Stack Overflow Tag Generator

In this article, we are fine tuning the T5 model for Stack Overflow tag generation using the Hugging Face Transformer library.

Hugging Face Transformers, Language Models, NLP, PyTorch

Fine-Tuning BERT using Hugging Face Transformers

In this post, we fine-tune BERT on Arxiv abstract classification dataset using the Hugging Face Transformers library.

Hugging Face Transformers, Language Models, NLP

BERT: Bidirectional Encoder Representations from Transformers – Unlocking the Power of Deep Contextualized Word Embeddings

In this article, we go through the introduction to BERT, including, its architecture, pretraining strategy, and inference

Hugging Face Transformers, Language Models, NLP