Vision Language Models
The advent of Generative AI has fundamentally transformed robotic intelligence enabling significant strides in how advanced humanoid robots 8220 perceive reason and act 8221 in the physical world This huge
Fine Tuning Gemma 3 allows us to adapt this advanced model to specific tasks optimizing its performance for domain specific applications By leveraging QLoRA Quantized Low Rank Adaptation and Transformers
In this article we explore OmniParser a UI screen parsing pipeline combining fine tuned YOLO model for icon detection and Florence2 for icon recognition and icon description generation