When it comes to writing optimized code, image loading plays an important role in computer vision. This process can be a bottleneck in many CV tasks and it can often be the culprit behind bad performance. We need to get images from the disk as fast as possible.
The most obvious example of the importance of this task would be an implementation of a Dataloader
class in any CNN training framework. It is crucial to make image loading fast. If it is not so, the training procedure becomes CPU bound and wastes precious GPU time.
Today we are going to look at some Python libraries which allow us to read images most efficiently. They are —
- OpenCV
- Pillow
- Pillow-SIMD
- TurboJpeg
Also, we will cover alternative methods of image loading from databases using:
- LMDB
- TFRecords
Finally, we will compare the loading time per image and find out which one is the winner!
Installation
Before we start – we need to create a virtual environment
$ virtualenv -p python3.7 venv
$ source venv/bin/activate
Then, install the required libraries:
$ pip install -r requirements.txt
Now we can go forward with our tasks.
Ways to load images
Structure
Usually, we need to load several images that are stored either in a database or just as a folder. In our scenario, an abstract image loader should be able to store the path to such a database or folder and load one image at a time from it. Moreover, we need to measure the time of some parts of the code. Optionally, some initialization may be required before the loading starts. Our ImageLoader
class looks like this:
import os
from abc import abstractmethod
class ImageLoader:
extensions: tuple = \
(".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".gif", ".tfrecords")
def __init__(self, path: str, mode: str = "BGR"):
self.path = path
self.mode = mode
self.dataset = self.parse_input(self.path)
self.sample_idx = 0
def parse_input(self, path):
# single image or tfrecords file
if os.path.isfile(path):
assert path.lower().endswith(
self.extensions,
), f"Unsupportable extension, please, use one of
{self.extensions}"
return [path]
if os.path.isdir(path):
# lmdb environment
if any([file.endswith(".mdb") for file in os.listdir(path)]):
return path
else:
# folder with images
paths = \
[os.path.join(path, image) for image in os.listdir(path)]
return paths
def __iter__(self):
self.sample_idx = 0
return self
def __len__(self):
return len(self.dataset)
@abstractmethod
def __next__(self):
pass
Image decoding functions in different libraries can return images in different formats – RGB or BGR. In our case, we use BGR color mode as default, but it always can be converted into the required format. In case you want to know the fun reason why OpenCV uses BGR format, click on this link.
Now we can inherit new classes from the base class and use them for our task.
OpenCV
The first one is the OpenCV library. We can use one simple function to read an image from the disk – cv2.imread
.
import cv2
class CV2Loader(ImageLoader):
def __next__(self):
start = timer()
# get image path by index from the dataset
path = self.dataset[self.sample_idx]
# read the image
image = cv2.imread(path)
full_time = timer() - start
if self.mode == "RGB":
start = timer()
# change color mode
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
full_time += timer() - start
self.sample_idx += 1
return image, full_time
Before image visualization, we need to mention that the OpenCV cv2.imshow
function requires an image in BGR format. Some libraries use RGB image mode as default, in this case, we convert images to BGR for a correct visualization.
You can try to load your image using our example with this function.
To test the OpenCV library, please, use this command:
$ python3 show_image.py --path images/cat.jpg --method cv2
This and next commands in the text will show you the image and its loading time using different libraries.
If everything goes well, you will see an image in the window like this:
Also, you can show all images from a folder. Instead of using a specific image, you can mention a path to the folder with images:
$ python3 show_image.py --path images/pexels --method cv2
This will show you all images from the folder one at a time together with their loading times. To stop the demo, you can press the ESC button.
Pillow
Let’s now try the PIL library. We can read an image using Image.open
function.
import numpy as np
from PIL import Image
class PILLoader(ImageLoader):
def __next__(self):
start = timer()
# get image path by index from the dataset
path = self.dataset[self.sample_idx]
# read the image as numpy array
image = np.asarray(Image.open(path))
full_time = timer() - start
if self.mode == "BGR":
start = timer()
# change color mode
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
full_time += timer() - start
self.sample_idx += 1
return image, full_time
We also convert the Image object to a Numpy array since it’s likely we’d want to apply some augmentations or pre-processing as a next step and Numpy is a default choice for it.
To check this out on a single image you can use:
$ python3 show_image.py --path images/cat.jpg --method pil
If you want to use it on the folder with images:
$ python3 show_image.py --path images/pexels --method pil
Pillow-SIMD
There is the fork-follower of the Pillow library with higher performance. Pillow-SIMD uses new techniques which allows reading and transforming images faster with the same API as standard Pillow.
Pillow and Pillow-SIMD cannot be used simultaneously in the same virtual environment – Pillow-SIMD will be used by default.
To use Pillow-SIMD and avoid mistakes caused by Pillow and Pillow-SIMD being together, you need to create a new virtual environment and use
$ pip install pillow-simd
Or you can uninstall the previous Pillow version and install Pillow-SIMD:
$ pip uninstall pillow
$ pip install pillow-simd
You don’t need to change anything in the code – the previous example is still working. To check that everything is fine you can use the commands from the previous Pillow part:
$ python3 show_image.py --path images/cat.jpg --method pil
$ python3 show_image.py --path images/pexels --method pil
TurboJpeg
There is another library called TurboJpeg. As it follows from the title – it can read only images compressed with JPEG.
Let’s create an image loader using TurboJpeg.
from turbojpeg import TurboJPEG
class TurboJpegLoader(ImageLoader):
def __init__(self, path, **kwargs):
super(TurboJpegLoader, self).__init__(path, **kwargs)
# create TurboJPEG object for image reading
self.jpeg_reader = TurboJPEG()
def __next__(self):
start = timer()
# open the input file as bytes
file = open(self.dataset[self.sample_idx], "rb")
full_time = timer() - start
if self.mode == "RGB":
mode = 0
elif self.mode == "BGR":
mode = 1
start = timer()
# decode raw image
image = self.jpeg_reader.decode(file.read(), mode)
full_time += timer() - start
self.sample_idx += 1
return image, full_time
TurboJpeg requires decoding of the input image, which is stored as a string of bytes.
You can try it with the following commands. But remember that TurboJpeg only allows processing of .jpeg images:
$ python3 show_image.py --path images/cat.jpg --method turbojpeg
$ python3 show_image.py --path images/pexels --method turbojpeg
LMDB
A commonly used approach to image loading when speed is a priority is to convert data into a better representation – database or serialized buffer – beforehand. One of the largest advantages of such “databases” is that they operate with zero system calls per data access, while the file system requires several system calls per data access. We can create an LMDB database that will collect all images in key-value format.
The following function allows us to create an LMDB environment with our images. LMDB’s “environment” is essentially a folder with special files created by LMDB library. This function only requires a list with image paths and save path:
import cv2
import lmdb
import numpy as np
def store_many_lmdb(images_list, save_path):
# number of images in our folder
num_images = len(images_list)
# all file sizes
file_sizes = [os.path.getsize(item) for item in images_list]
# the maximum file size index
max_size_index = np.argmax(file_sizes)
# maximum database size in bytes
map_size = num_images * cv2.imread(images_list[max_size_index]).nbytes * 10
# create lmdb environment
env = lmdb.open(save_path, map_size=map_size)
# start writing to environment
with env.begin(write=True) as txn:
for i, image in enumerate(images_list):
with open(image, "rb") as file:
# read image as bytes
data = file.read()
# get image key
key = f"{i:08}"
# put the key-value into database
txn.put(key.encode("ascii"), data)
# close the environment
env.close()
There is a python script which creates an LMDB environment with images:
–path argument should contain the path to your collected images folder
–output argument should be a directory where LMDB will be created
$ python3 create_lmdb.py --path images/pexels --output lmdb/images
Now, as the LMDB environment has been created we can load our images from it. Let’s create a new loader class.
In the case of loading images from the database, we need to open this database for reading. There is a new function called open_database. It returns the iterator to navigate through the opened database. Also, as this iterator comes to the end of the data, we need to return it back to the start of the database using _iter_ function.
LMDB allows us to store the data, but there is no built-in decoder for images. For the lack of a decoder, we will use cv2.imdecode
function here.
class LmdbLoader(ImageLoader):
def __init__(self, path, **kwargs):
super(LmdbLoader, self).__init__(path, **kwargs)
self.path = path
self._dataset_size = 0
self.dataset = self.open_database()
# we need to open the database to read images from it
def open_database(self):
# open the environment by path
lmdb_env = lmdb.open(self.path)
# start reading
lmdb_txn = lmdb_env.begin()
# create cursor to iterate through the database
lmdb_cursor = lmdb_txn.cursor()
# get number of items in full dataset
self._dataset_size = lmdb_env.stat()["entries"]
return lmdb_cursor
def __iter__(self):
# set the cursor to the first database element
self.dataset.first()
return self
def __next__(self):
start = timer()
# get raw image
raw_image = self.dataset.value()
# convert it to numpy
image = np.frombuffer(raw_image, dtype=np.uint8)
# decode image
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
full_time = timer() - start
if self.mode == "RGB":
start = timer()
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
full_time += timer() - start
start = timer()
# step to the next element in database
self.dataset.next()
full_time += timer() - start
return image, full_time
def __len__(self):
# get dataset length
return self._dataset_size
After we have created the environment and loader class we can check its correctness and show images from it. Now in –path argument we need to mention the path to LMDB environment. Remember that you can stop showing using the ESC button.
$ python3 show_image.py --path lmdb/images --method lmdb
TFRecords
Another useful database is TFRecords. To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly (TensorFlow manual).
Before we create the tfrecords file, we need to choose the structure of the database. TFRecords allows keeping items with many additional features. You can save the file name or image width and height, if it is needed. All these things should be collected in python dictionary, i.e.
image_feature_description = {
"height" :tf.io.FixedLenFeature([], tf.int64),
"width" :tf.io.FixedLenFeature([], tf.int64),
"filename": tf.io.FixedLenFeature([], tf.string),
"label": tf.io.FixedLenFeature([], tf.int64),
"image_raw": tf.io.FixedLenFeature([], tf.string),
}
In our example, we will use only the image in raw byte format and its unique key called “label.”
import os
import tensorflow as tf
def _byte_feature(value):
"""Convert string / byte into bytes_list."""
if isinstance(value, type(tf.constant(0))):
# BytesList can't unpack string from EagerTensor.
value = value.numpy()
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
"""Convert bool / enum / int / uint into int64_list."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def image_example(image_string, label):
feature = {
"label": _int64_feature(label),
"image_raw": _byte_feature(image_string),
}
return tf.train.Example(features=tf.train.Features(feature=feature))
def store_many_tfrecords(images_list, save_file):
assert save_file.endswith(
".tfrecords"
), 'File path is wrong, it should contain "*myname*.tfrecords"'
directory = os.path.dirname(save_file)
if not os.path.exists(directory):
os.makedirs(directory)
# start writer
with tf.io.TFRecordWriter(save_file) as writer:
# cycle by each image path
for label, filename in enumerate(images_list):
# read the image as bytes string
image_string = open(filename, "rb").read()
# save the data as tf.Example object
tf_example = image_example(image_string, label)
# and write it into database
writer.write(tf_example.SerializeToString())
Please, note that we convert images using tf.image.decode_jpeg
function because all our images are stored as JPEG files. You can also use tf.image.decode_image
as a universal decoder.
To check the correctness of the created database you can show images from it:
$ python3 show_image.py --path tfrecords/images.tfrecords --method tfrecords
Loading time comparison
Now we have five different methods of image loading. Let’s find out which one is the best!
We will use some open images from pexels.com with different shapes and jpeg extension. And all time measurements will be averaged with 5000 iterations. Moreover, averaging will mitigate the impact of OS/hardware specific logic, for example, data caching. It is expected that the first iteration in the first method under evaluation will suffer from the initial loading of the data from disk into a cache, while the other methods will be free of that.
All experiments are running for both BGR and RGB image modes to cover all potential needs and different tasks. Please, remember that Pillow and Pillow-SIMD can not be used in the same virtual environment. To create the final comparison table we did two separate experiments for Pillow and Pillow-SIMD.
To run the measurements use:
$ python3 benchmark.py --path images/pexels --method cv2 pil turbojpeg lmdb tfrecords --iters 100 --mode BGR
Library | Mode | Mean read time (sec) | Median read time (sec) |
---|---|---|---|
OpenCV | BGR | 0.003591 | 0.0010559 |
OpenCV | RGB | 0.003731 | 0.0010915 |
Pillow | BGR | 0.004018 | 0.0012519 |
Pillow | RGB | 0.003960 | 0.0012235 |
Pillow-SIMD | BGR | 0.002825 | 0.0008151 |
Pillow-SIMD | RGB | 0.002791 | 0.0007866 |
TurboJpeg | BGR | 0.002259 | 0.0006032 |
TurboJpeg | RGB | 0.002257 | 0.0006026 |
LMDB | BGR | 0.003509 | 0.0009936 |
LMDB | RGB | 0.003560 | 0.0010263 |
TFRecords | BGR | 0.002818 | 0.0010221 |
TFRecords | RGB | 0.002640 | 0.0009445 |
Moreover, it would be interesting to compare databases reading speed with the same decoder function. It can show which database loads its data faster. In this case, we use cv2.imdecode
function for both TFRecords and LMDB.
Loader | Mean read time (sec) | Median read time (sec) |
---|---|---|
TFRecords | 0.004182 | 0.001356 |
LMDB | 0.003688 | 0.001023 |
All experiments were calculated on:
- Intel® Core™ i7-2600 CPU @ 3.40GHz × 8
- Ubuntu 16.04 64-bit
- Python 3.7
Summary
In this post, we considered some approaches to image loading and compared them with each other. The comparison results on JPEG images are really interesting. We can see that the TurboJpeg is the fastest library to load the images as numpy, but with one exception – it can read files only with jpeg extension.
Another important thing to mention is that Pillow-SIMD is faster than the original Pillow. In our task the loading speed increased nearly by 40%.
If you plan to use an image database – TFRecords shows better mean results than LMDB, in particular, because of the built-in decoder function. On the other hand, LMDB allows us to read images faster. Surely, you can always combine a decoder function and a database, for example, use TurboJpeg as a decoder and LMDB as an image storage.