Ultralytics, the company renowned for developing the YOLOv8 model, has recently created a new exploratory data analysis tool, Ultralytics Explorer, to explore image datasets for computer vision. The Ultralytics Explorer provides a Python API and a GUI interface, allowing you to select the most suitable option. This article will explore all the functionalities of the Ultralytics Explorer API and its use cases. Instead of just going through the documentation, we’ll take a practical approach and visualize a custom wildlife animal dataset using the Explorer API. With the Ultralytics Explorer API, you can gain new insights and efficiencies in computer vision projects.
All the code discussed in this article is free to grab. Just hit the “Download Code” button to get started.
What is Ultralytics Explorer API?
Ultralytics Explorer API is a built-in Python library that you can use to explore the datasets. It has different functionality, including SQL query search, vector similarity search, AI semantic search, and more. Explorer API uses LanceDB under the hood, which is used as the backend database for storing and querying image embeddings. It allows for efficient storage and retrieval of high-dimensional data, such as image embeddings, by leveraging its capabilities for vector search.
LanceDB is an open-source AI vector database that efficiently stores, manages, and retrieves embeddings for large-scale multi-modal data. A vector database efficiently manages and searches vector data, which is essential in AI for handling embeddings from various data types. It transforms data into high-dimensional vectors using deep learning models. LanceDB is powered by Lance file format, and supports compute storage separation, allowing you to scale locally to large datasets without running out of memory. It stores these using specialized indexing (k-d trees, HNSW, or Annoy) for fast retrieval and supports similarity searches based on distance metrics (Euclidean distance or cosine similarity).
We will explore all the features of our custom data in this article. Let’s dive into some hands-on activities now.
Dataset Overview
The Dataset we are using with Ultralytics Explorer API is the wildlife Animal Dataset, 1500 images, consisting of 4 classes – buffalo, elephant, rhino, and zebra. We will provide the dataset with the code, which you can download below.
Setting up the Environment
First, we need to install the Ultralytics Explorer library in our local system. We can do that by using the following command –
pip install ultralytics[explorer] openai
We are using the pip command to install the package. We need to install the openai package to use the AskAI feature in our code. After installation, we are ready to import the library into our codebase.
from ultralytics import Explorer
Following the import, we create an Explorer object, which will be utilized across our code for various tasks.
exp = Explorer("path/to/your/data.yaml", model="path/to/your/model.pt")
exp.create_embeddings_table()
After creation of the exp
object, we use the create_embeddings_table
method to check if a table already exists or needs to be created. Then, it loads the dataset and generates embeddings for each image, storing them in a LanceDB table.
exp.create_embeddings_table(force=True)
The embedding table for a given dataset and model pair is only created once and reused. LanceDB scales on-disk and stores all the embeddings on your local system without running out of memory. If You want to update the embeddings forcefully, you can pass force=True
to the create_embeddings_table
method.
These embeddings are generated using a YOLOv8 model. Ultralytics Explorer API supports all the popular datasets and pre-trained YOLOv8 models out of the box. We used a custom wildlife animal dataset and a custom YOLOv8 model fine-tuned on our data. We won’t cover the finetuning part in this article, but if you want to learn more, check out our detailed guide on fine-tuning YOLOv8.
Similarity Search
Similarity search is a method used to find images that are similar to a specified image. This approach relies on the concept that images with similar characteristics will have comparable embeddings. After creating the embeddings table, you can conduct a semantic search in any of the following manners:
For a specific index or range of indices in the dataset:
exp.get_similar(idx=[1000,1020], limit=10)
similar.head()
We can pass the index numbers of the input images in idx
parameter and set a limit
on the number of output images we want to visualize.
For any image or set of images not present in the dataset:
exp.get_similar(img=["path/to/img1", "path/to/img2"], limit=10)
We can also provide the image paths of the input image into img
argument to do a similar function. The results are impressive.
When using multiple inputs, their embeddings are combined for the search. The result is a pandas data frame containing a limited number of most similar data points to your input, including the distance between them in the embedding space.
As you can see, the distance between vector points (image embeddings) increases when we move from the top to the bottom of the table. The get_similar
function identifies the closest matching images by measuring the distance between your input image and every image in the dataset, delivering the nearest neighbors of your input images.
Plotting Similar Images
We can plot a similar image using the plot_similar
method. It takes the same arguments as get_similar
and plots similar images.
exp.plot_similar(idx=10, limit=10)
exp.plot_similar(idx=[100,701], limit=100) # Can also pass list of idxs or imgs
The results are shown below for your reference.
Comparison Of Results: pretrained vs finetuned
We made a comparison to understand the concept better by using the pre-trained and fine-tuned models separately to generate the image embeddings. The similarity search has been performed on both embeddings.
Fine-tuned model embeddings tend to produce more accurate and similar images without errors, unlike the pre-trained models, which often generate many incorrect or irrelevant images.
Fine-tuned model embeddings are more accurate for our custom dataset. The custom model is finetuned on our custom dataset, as it generates better embeddings.
Image embeddings of our custom model are better and more accurate as it can find similar images without any error, even if the limit
is 100.
We compare the results by running the same with the default YOLOv8 model, and you can notice the errors. Image embeddings are inaccurate, so the similarity search function cannot find similar images after a specific limit.
Ask AI (Natural Language Querying)
Ultralytics Explorer API has built-in support for Quering with LLM. If you don’t want to write SQL queries, you can use LLM to convert your standard text into an SQL query and pass it to the vector database.
df = exp.ask_ai("show me 10 images containing exactly 2 persons")
print(df.head())
# plot the results
plt = plot_query_result(df)
plt.show()
The ask_ai feature enables semantic search by interpreting the user’s natural language descriptions of what they are looking for, converting those descriptions into a form that can be understood by the database (SQL query), and then retrieving relevant images based on that interpretation. Below are the results of this code snippet.
You need an OpenAI API key to run this feature, which you can get on their official site.
Once you’ve completed setting up your API key, simply provide your API key when running the ask_ai
code cell.
SQL Querying
Ultralyticas Explorer API lets you explore specific entries in your dataset by running SQL queries. It supports two formats:
- Short-hand queries starting with “
WHERE
” automatically select all columns. - Full queries allow you to specify which columns to select.
table = exp.sql_query("WHERE labels LIKE '%zebra%' OR labels LIKE '%rhino%' LIMIT 10")
print(table.head())
sql_query
allows you to do more customized searches on the dataset by combining the SQL query and semantic search to filter down to specific types of results. The results can be seen here:
Plotting SQL Query Results
You can also plot the results of a SQL query using the plot_sql_query method. This method takes the same arguments as sql_query
and plots the results in a grid.
exp.plot_sql_query("WHERE labels LIKE '%elephant%' LIMIT 20", labels=True)
Results for better visualization:
In a future article, we’ll delve into advanced topics, exploring the capabilities of working directly with embedding tables and utilizing Explorer’s advanced querying, indexing, and visualization techniques. Stay tuned for an in-depth guide on leveraging these powerful features for enhanced data analysis.
Ultralytics Explorer GUI
Explorer GUI is a user interface(UI) for Ultralytics Explorer API made with streamlit. You can run similarity searches, SQL queries, and even searches using Ask AI without writing any code. You simply need to install the Ultralytics Explorer library in your local Python environment. We have worked with the VOC dataset and YOLOv8l model, included in Ultralytics Explorer API.
! pip install ultralytics[explorer]
After this, you need to run the following command:
! yolo explorer
After successful execution of the command, a web interface will appear, which we can use to perform all the Ultralytics Explorer API features.
Semantic Search / Vector Similarity Search
Semantic search is a method used to find images similar to a chosen image or images. It works on the principle that similar images share similar characteristics, known as embeddings. In the user interface, you can pick one or more images and look for others that resemble them. This is helpful when you’re trying to find images that look like a specific one or when you want to identify images that aren’t working as intended.
You can specify how many images you want to see and the index of the input image in the dataset, and the explorer will show you the images most similar to your chosen one.
Explorer API will access the vector database (in this case, LanceDB) and find and display the most similar feature vectors or embeddings that correspond to the input image.
You can choose any of the displayed images to find images similar to the corresponding image.
Ask AI
You have the same ASK AI feature in Explorer GUI, which uses the same workflow as Ultralytics Explorer API. You need to give a simple prompt like “Show 10 images with exactly 5 persons.” You will see the result like this:
To use the ASK AI feature, you need to use the OpenAI API key by giving the following command:
! yolo settings openai_api_key="..."
This operates with OpenAI’s Large Language Model (LLM) at its core, so the outcomes are based on probabilities and might not always be accurate.
Run SQL queries on your CV dataset
You can run SQL queries to filter exactly what you want to see in your dataset.
Here, we used a query WHERE labels LIKE '%person%' AND labels LIKE '%car%'
. You can write more complex queries for better results. This SQL query feature runs the Ultralytics Explorer API SQL query feature and displays the results.
Vector Search Algorithms Used In Explorer API
Vector Search is a way to find the closest matching items in a database. It’s used in recommendation systems or search engines to help you find products similar to what you’re looking for. In AI applications, including Large Language Models (LLM), each piece of data is represented by unique patterns, called embeddings, which the system uses to identify the most relevant matches.
The search works by looking for the closest neighbors to your search query in a space filled with these data points, a method known as finding the K-Nearest-Neighbors (KNN).
Ultralytics Explorer API uses LanceDB for this search; a “Metric” measures the distance between two data points. It currently supports two types of measurements:
- L2 (Euclidean distance)
- Cosine.
The Explorer uses the L2 metric for similarity searches, finding items closest to your query based on this measurement.
L2
L2 distance, often called Euclidean distance, is like measuring the straight-line distance between two points in space. Imagine you’re on a flat surface with two dots; the L2 distance would be the length of a straight line you’d draw between these two dots. Mathematically, if you have two points,
P and Q, with coordinates (p1,p2,…,pn) and (q1,q2,…,qn) in an n-dimensional space, the L2 distance between them is calculated as:
This formula essentially sums up the squares of the differences between the corresponding coordinates of the points and then takes the square root of that sum.
Cosine
Cosine similarity measures the cosine of the angle between two vectors in an n-dimensional space, giving us an idea of how similar the directions of the two vectors are, regardless of their magnitude. It’s like comparing the directions in which two arrows are pointing rather than how long they are. The formula for cosine similarity between two vectors
A and B are:
Where A.B is the dot product of A and B, and ||A|| and ||B|| are A and B’s magnitudes (or lengths), respectively. The dot product adds up the products of the corresponding entries of the two sequences of numbers, and the magnitudes are calculated using the L2 formula but applied to each vector individually. The result ranges from -1 to 1, where 1 means the vectors are in the same direction, -1 means they are in opposite directions, and 0 indicates orthogonality (no similarity).
For more information, read the articles in the reference section.
Conclusion
As we conclude our exploration of the Ultralytics Explorer API, it’s clear that this tool represents a significant leap forward in exploring image datasets for computer vision. The seamless combination of Python APIs and GUI interface democratizes advanced data analysis, making it accessible to both enthusiasts and experts. With Explorer API’s robust functions, from similarity searches to AI-powered queries, users can easily gain deep insights from their data. With the Ultralytics Explorer API, you will be able to explore the complexity of computer vision with ease and accuracy.
This article has been promoted by Ultralytics, which made it the official article of Ultralytics Explorer API.
Reference
Ultralytic Explorer Documentation