NLP

Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL

What if object detection wasn't just about drawing boxes, but about having a conversation with an image? Dive deep into the world of Vision Language Models (VLMs) and see how

Computer Vision, LLMs, NLP, Vision Language Models, VLMs

Inside Sinusoidal Position Embeddings: A Sense of Order

In the groundbreaking 2017 paper “Attention Is All You Need”, Vaswani et al. introduced Sinusoidal Position Embeddings to help Transformers encode positional information, without recurrence or convolution. This elegant, non-learned

Language Models, LLMs, NLP

Inside RoPE: Rotary Magic into Position Embeddings

Self-attention, the beating heart of Transformer architectures, treats its input as an unordered set. That mathematical elegance is also a curse: without extra signals, the model has no idea which

Language Models, LLMs, NLP

FineTuning Gemma 3n for Medical VQA on ROCOv2

What if a radiologist facing a complex scan in the middle of the night could ask an AI assistant for a second opinion, right from their local workstation? This isn't

Computer Vision, Generative AI, Generative Models, LLMs, Multimodal Models, NLP, Transformer Neural Networks, Vision Language Models, Vision Transformer, VLMs

SigLIP 2: DeepMind’s Multilingual Vision-Language Model

SigLIP-2 represents a significant step forward in the development of multilingual vision-language encoders, bringing enhanced semantic understanding, localization, and dense feature extraction capabilities. Built on the foundations of SigLIP, this

Computer Vision, Generative AI, LLMs, NLP, VLMs

Getting Started with Qwen3 – The Thinking Expert

Discover Qwen3, Alibaba’s open-source thinking LLM. Switch between fast replies and chain-of-thought reasoning with 128 K context, and MoE efficiency. Learn how to use and Fine Tune.

Generative AI, Language Models, LLMs, NLP

GraphRAG: The Practical Guide for Cost-Effective Document Analysis with Knowledge Graphs

GraphRAG is a pivotal research from Microsoft improving the shortcomings of naive RAG by employing structured Knowledge graph which includes entities, relations, claims etc, for traceability by traversing multi-hop nodes.

Generative AI, LLMs, NLP, RAGs

Image Captioning using ResNet and LSTM

Image Captioning using ResNet and LSTM bridges vision and language, enabling machines to "see" images and "describe" them in text. This model powers applications like accessibility for visually impaired users,

Computer Vision, Deep Learning, NLP

Recommendation System using Vector Search with Qdrant

In this article, we explore how to build a movie recommendation system using vector search with Qdrant. You'll learn about vector databases, sparse and dense vectors, and how the Retrieval-Augmented

Deep Learning, NLP

Retrieval Augmented Generation – RAG with LLMs

In this article, we explore RAG with LLMs using LangChain and Hugging Face Transformers.

Hugging Face Transformers, Language Models, LLMs, NLP, RAGs

Fine-Tuning LLMs using PEFT

In this article, we explore different fine-tuning techniques for LLMs and fine-tune the FLAN T5 LLM using PEFT with the Hugging Face Transformers library.

Language Models, LLMs, NLP

Deciphering LLMs: From Transformers to Quantization

In this article, we explore LLMs, starting from Transformers, use case, to quantization.

Language Models, LLMs, NLP

NLP