In the groundbreaking paper “Attention is all you need”, Transformers architecture was introduced for sequence to sequence tasks in NLP. Models like Bert, GPT were built on the top of Transformers ...
Sapiens: Foundation for Human Vision Models by Meta
Sapiens, a family of foundational Human Vision Models by Rawal et al., from Meta, achieves state-of-the-art results for human centric tasks like 2D pose estimation, body-part segmentation, depth ...
ColPali: Enhancing Financial Report Analysis with Multimodal RAG and Gemini
ColPali multimodal RAG offers a novel approach for efficient retrieval of elements such as images, tables, charts, and texts by treating each page as an image. This method takes advantage of Vision ...
SDXL Inpainting: Fusing Image Inpainting with Stable Diffusion
Suppose you have an old photo of your childhood with your parents which is close to your heart. Unfortunately, some parts of it have become damaged or corrupted over time. But what if I tell you that ...
Deploying a Deep Learning Model using Hugging Face Spaces and Gradio
In deep learning, training a model is not the final step. Be it image classification or object detection, a deep learning project becomes worthwhile only when it reaches the masses. That's where ...