Classification with Localization: Convert any Keras Classifier to a Detector

Image classification is used to solve several Computer Vision problems; right from medical diagnoses, to surveillance systems, on to monitoring agricultural farms. There are innumerable possibilities to explore using Image Classification.

If you have completed the basic courses on Computer Vision, you are familiar with the tasks and routines involved in Image Classification tasks. Want to know more?

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

Image Classification tasks follow a standard flow – where you pass an image to a deep learning model and it outcomes the class or the label of the object present.

Image classification with keras on a butterfly image

While learning Computer Vision, most often a project that would be equivalent to your first hello world project, will most likely be an image classifier. You attempt to solve something like the digit recognition on MNIST Digits dataset or maybe the Cats and Dog Classification problem.

The Deep Learning wave started in 2012. It was primarily due to Alexnet, a Convolutional Neural Network (CNN) image classifier.

Image Classifiers not only have a big place in industrial applications but also are a very natural resource to learn about Computer Vision and CNNs.

The most popular frameworks for creating image classifiers are either Keras (a wrapper over Tensorflow) or Pytorch. Even if you have never coded a Neural network, with these libraries you can actually create and train a decent deep learning model with just a few lines of code. You need not build anything from scratch, simply use the inbuilt features, that too in a matter of minutes!

Another important problem in Computer Vision, similar to classification, is Object Detection, where you pass in an image to a Deep Learning model and you get class labels and also some bounding box coordinates, telling you where exactly the object is present in the image.

This extra location information can be really useful in real-life applications such as self-driving cars. In such applications you need to know where exactly the other cars are present on the road.

Object Detection with keras on a butterfly image.

You might start thinking now: Can you create an Object Detector in Keras?

Of course, you can; but you may have to write a lot of code. You would also need a lot of prior knowledge to be able to train the Object Detector correctly. It’s also really easy to make mistakes and the hardest part is to debug the code in case of errors.

If you still have doubts then just take a look at this Official Keras code for creating and training a custom RetinaNet detector.

Fortunately, if you want to train an Object Detector, you actually won’t need to code everything in Keras. The TensorFlow team provides you with Tensorflow Object Detection (TFOD) API – a high-level API for TensorFlow built specifically to make it easier to train Object Detection models.

Usually, beginners get comfortable with creating classifiers and take time to move to TFOD API to create detection models because they face installation problems, computation requirements, confusing steps, lots of scripts to run, etc.

From a beginner’s perspective, these are all valid issues but I would say the TFOD API has gotten a lot better in the past few months. Especially with TF 2.0 support, it comes with many models, better integration with Keras, and much more.

Still, a fact remains that for early practitioners, it can be a bit intimidating to explore this API.

At this stage, you might start wondering: Why follow such a complicated model and features? Can’t we simply modify our classifiers so they perform detection as well?

Well, actually YOU CAN.

How to make Classifiers perform detection?

The answer would be to create a multi-output classification network with two output branches. One branch can output class labels and the second one can output your bounding box coordinates.

If you are not familiar with Multi-Output Models, visualize them to be types of architecture that can have multiple branches inside the same network to output different things.

Still feel confused about how a multi-output model can be created for detection purposes? Don’t worry, we are here to help you. You may want to watch this lecture on Object Localization by Andrew Ng from his deepleaning.ai specialization. He explains the full concept in a really simple and clear manner.

You might ask if creating detectors this way is really easy and effective, then why don’t more people do it this way?

Well, there’s a catch. Simply speaking, the approach I’ve described above is not exactly Object Detection. It’s actually called Classification with Localization.

Why is it so?

In normal Object Detection, you can detect multiple objects in a single image. On the other hand, classification with localization setup is limited to detecting only one object per image. (You could theoretically include more boxes, but your network performance will be really poor).

Here’s an image that explains the difference.

Classification With Localization (Keras Code):

Now that we have understood what we want to achieve, let’s start with the code. Make sure to download the code from the Download Section.

Here’s our outline of the pipeline:

Step 1: Downloading the Detection Dataset
Step 2: Perform Data Augmentation with imgaug library
Step 3: Preprocess the Dataset
Step 4: Visualize the Data with its Annotations
Step 5: Split the Data into Train and Validation Set
Step 6: Construct the Multi-Output Model
Step 7: Compile & Train the Model
Step 8: Plot Model’s Loss & Accuracy Curves
Step 9: Test your model on New Images

Step 1: Downloading the Detection Dataset

Let’s start by downloading the dataset. We have extracted a subset of the CALTECH-101 dataset and created a dataset which contains images for only 3 classes, namely elephant, butterfly, and cougar-face. The class labels and bounding box coordinates for each image are present in .MAT (Matlab) format.

After downloading the data, you will need to extract it.

# Download the dataset
!wget -O CALTECH.zip https://zenodo.org/record/4126613/files/CALTECH.zip?download=1

# Extract all the contents of zip file in current directory
with ZipFile('CALTECH.zip', 'r') as zipObj:
   zipObj.extractall()

Step 2: Perform Data Augmentation with imgaug library

Data Augmentation is a very useful technique to expand a dataset and make a model generalize better.

There are many types of augmentations we can perform on images. Normally when you’re dealing with an image classification problem, you can do augmentation with the Keras built-in function –

tf.keras.preprocessing.image.ImageDataGenerator

Note: Unfortunately, we can’t use the ImageDataGenerator in this notebook!

Why not?

We will not be able to use it in augmentation because we employ techniques like shift, rotation, shear, crop, zoom, etc. All these techniques will fundamentally change the image but when the images change the corresponding bounding box coordinates should also change. Since the Keras augmentation function only modifies the image and not the box coordinates so it’s a really bad idea to use the default augmentation function.

So what can we do?

Well, we have 3 options.

Use the Keras default augmentation module but use augmentations that do not change the image spatially. e.g. centering, normalization, whitening, etc.
Create our own custom augmentation functions which modify the images as well as the bounding boxes.
Use a library like imgaug, which allows you to augment images and bounding box locations.

Now, which one should we use?

Option 1 is by far the least effective – if you’re not spatially modifying the image then you don’t have many options.

If you go with Option 2 then you would need to write a lot of boilerplate code and deal with bugs.

For this notebook, I went with option 3, which is the most effective option. imgaug library will allow us to do a number of augmentations. The best part being that these changes will also be applied to the respective bounding boxes.

With imgaug we will apply 7 different types of augmentations to our dataset:

Rescaling the image
Rotating the image
Shifting the image
Flipping the image
Changing the brightness
Blurring to the image
Adding Noise to the image

You can look at imgaug docs here.

The code we have written will randomly use any 3 of these augmentations on each image

The data augmentation scheme performed here is a modified version of the one by Asset Karazhay explained in this post.

Convert MAT Labels into CSV

To perform data augmentation we first have to convert our labels into a CSV format. This helps the imgaug library to use the labels to augment the images easily as well as their bounding boxes.

Function to Extract MAT attributes:

The CALTECH-101 dataset comes in the form of MATLAB files. Since we’ll be modifying our classifier to do detection we need the annotations in a more familiar format. So what we’ll do is create a function to extract the class label and the bounding box coordinates (x1, y1, x2, y2) from each MAT File. We will also extract the name of the image so we can associate each MAT file with its corresponding image.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

def extract_mat_contents(annot_directory, image_dir):
        
        # Create MAT Parser
        mat = scipy.io.loadmat(annot_directory)

        # Get the height and width for our image
        height, width = cv2.imread(image_dir).shape[:2]

        # Get the bounding box co-ordinates
        x1, y2, y1, x2 = tuple(map(tuple, mat['box_coord']))[0]

        # We Split the image Directory passed in the method and choose the index
        # Of the Folders name which is the same as it's class
        class_name = image_dir.split('/')[2]

        filename = '/'.join(image_dir.split('/')[-2:])

        # Return the extracted attributes
        return filename,  width, height, class_name, x1,y1,x2,y2

The function above will be used in the function below to go through the image and annotation directories and to extract their information such as:

Height
Width
Bounding box coordinates
Class

Once extracted, this information will be stored in a pandas dataframe which helps us use it with imgaug to perform augmentations.

# Function to convert MAT files to CSV
def mat_to_csv(annot_directory, image_directory, classes_folders):

  # List containing all our attributes regarding each image
  mat_list = []

  # We loop our each class and its labels one by one to preprocess and augment 
  for class_folder in classes_folders:

    # Set our images and annotations directory
    image_dir = os.path.join(image_directory, class_folder)
    annot_dir = os.path.join(annot_directory, class_folder) 

    # Get each file in the image and annotation directory
    mat_files = sorted(os.listdir(annot_dir))
    img_files = sorted(os.listdir(image_dir))

    # Loop over each of the image and its label
    for mat, image_file in zip(mat_files, img_files):
      
      # Full mat path
      mat_path = os.path.join(annot_dir, mat)

      # Full path Image
      img_path = os.path.join(image_dir, image_file)

      # Get Attributes for each image 
      value = extract_mat_contents(mat_path, img_path)

      # Append the attributes to the mat_list
      mat_list.append(value)

  # Columns for Pandas DataFrame
  column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 
                 'xmax', 'ymax']

  # Create the DataFrame from mat_list
  mat_df = pd.DataFrame(mat_list, columns=column_name)

  # Return the dataframe
  return mat_df

# The Classes we will use for our training
classes_list = sorted(['butterfly',  'cougar_face', 'elephant'])

# Set our images and annotations directory
image_directory = 'CALTECH/CALTECH_Dataset'
annot_directory = 'CALTECH/CALTECH_Annotations'

# Run the function to convert all the MAT files to a Pandas DataFrame
labels_df = mat_to_csv(annot_directory, image_directory, classes_list)

# Saving the Pandas DataFrame as CSV File
labels_df.to_csv(('labels.csv'), index=None)

Now that we have the data ready in the required format, let’s proceed to perform the augmentations.

In this step, we will create a helper function which will convert imagaug’s bounding box objects to panda’s dataframe.

# Function to convert bounding box image into DataFrame 
def bounding_boxes_to_df(bounding_boxes_object):

    # Convert Bounding Boxes Object to Array
    bounding_boxes_array = bounding_boxes_object.to_xyxy_array()
    
    # Convert the array into DataFrame
    df_bounding_boxes = pd.DataFrame(bounding_boxes_array,columns=['xmin', 'ymin', 'xmax', 'ymax'])
    
    # Return the DataFrame
    return df_bounding_boxes

Initialize the Augmentations

Initialize the imgaug library with the augmentations we are going to use with the images. imgaug selects n random augmentations from the list below and applies it to an image. For every image, a new random combination of augmentation is used. You can set this value of n below as needed. Here, I’m setting it to 2.

# Define all the Augmentations you want to apply to your dataset
# We're setting random `n` agumentations to 2. 
image_augmentations = iaa.SomeOf( 2,
    [                                 
    # Scale the Images
    iaa.Affine(scale=(0.5, 1.5)),

    # Rotate the Images
    iaa.Affine(rotate=(-60, 60)),

    # Shift the Image
    iaa.Affine(translate_percent={"x":(-0.3, 0.3),"y":(-0.3, 0.3)}),

    # Flip the Image
    iaa.Fliplr(1),

    # Increase or decrease the brightness
    iaa.Multiply((0.5, 1.5)),

    # Add Gaussian Blur
    iaa.GaussianBlur(sigma=(1.0, 3.0)),
    
    # Add Gaussian Noise
    iaa.AdditiveGaussianNoise(scale=(0.03*255, 0.05*255))

])

As you can see we are not only adding different types of augmentations but also setting some range for each type of augmentation. You can change these values, for more information on augmentation options in imgaug, refer to this.

Create a Function to Apply Augmentations

This function will finally perform the image augmentations to both images and bounding boxes. It will read the labels dataframe we created earlier to obtain the bounding box information for each image and as it augments the image. Then it will also edit the bounding box coordinates so the coordinates remain true even after image is processed.

def image_aug(df, images_path, aug_images_path, augmentor, multiple = 3):
    
    # Fill this DataFrame with image attributes
    augmentations_df = pd.DataFrame(
        columns=['filename','width','height','class', 'xmin', 'ymin', 'xmax','ymax'])
    
    # Group the data by filenames
    grouped_df = df.groupby('filename')

    # Create the directory for all augmentated images
    if not os.path.exists(aug_images_path):
      os.mkdir(aug_images_path)

    # Create directories for each class of augmentated images
    for folder in df['class'].unique():
      if not os.path.exists(os.path.join(aug_images_path, folder)):
        os.mkdir(os.path.join(aug_images_path, folder))

    for i in range(multiple):
      
      # Post Fix we add to the each different augmentation of one image
      image_postfix = str(i)

      # Loop to perform the augmentations
      for filename in df['filename'].unique():

        augmented_path = os.path.join(aug_images_path, filename)+image_postfix+'.jpg'

        # Take one image at a time with its information
        single_image = grouped_df.get_group(filename)
        single_image = single_image.reset_index()
        single_image = single_image.drop(['index'], axis=1)   
        
        # Read the image
        image = imageio.imread(os.path.join(images_path, filename))

        # Get bounding box
        bounding_box_array = single_image.drop(['filename', 'width', 'height','class'], axis=1).values

        # Give the bounding box to imgaug library
        bounding_box = BoundingBoxesOnImage.from_xyxy_array(bounding_box_array, shape=image.shape)

        # Perform random 2 Augmentations
        image_aug, bounding_box_aug = augmentor(image=image,bounding_boxes=bounding_box)
        
        # Discard the the bounding box going out the image completely   
        bounding_box_aug = bounding_box_aug.remove_out_of_image()

        # Clip the bounding box that are only partially out of th image
        bounding_box_aug = bounding_box_aug.clip_out_of_image()

        # Get rid of the the image if bounding box was discarded  
        if re.findall('Image...', str(bounding_box_aug)) == ['Image([]']:
            pass
        
        else:
        
          # Create the augmented image file
          imageio.imwrite(augmented_path, image_aug) 

          # Update the image width and height after augmentation
          info_df = single_image.drop(['xmin', 'ymin', 'xmax', 'ymax'], axis=1)    
          for index, _ in info_df.iterrows():
              info_df.at[index, 'width'] = image_aug.shape[1]
              info_df.at[index, 'height'] = image_aug.shape[0]

          # Add the prefix to each image to differentiate if required
          info_df['filename'] = info_df['filename'].apply(lambda x: x + image_postfix + '.jpg')

          # Create the augmented bounding boxes dataframe 
          bounding_box_df = bounding_boxes_to_df(bounding_box_aug)

          # Concatenate the filenames, height, width and bounding boxes 
          aug_df = pd.concat([info_df, bounding_box_df], axis=1)

          # Add all the information to augmentations_df we initialized above
          augmentations_df = pd.concat([augmentations_df, aug_df])            
      
    # Remove index
    augmentations_df = augmentations_df.reset_index()
    augmentations_df = augmentations_df.drop(['index'], axis=1)

    # Return the Dataframe
    return augmentations_df

augmented_images_df = image_aug(labels_df, image_directory, 'aug_images', 
                                image_augmentations)

Apply the function above for augmentation.

augmented_images_df = augmented_images_df.sort_values('filename', ignore_index= True)
augmented_images_df.to_csv('aug.csv')

# Check Dataset Size
print('Our total dataset Size before the augmentations was: ', len(labels_df))
print('Our total dataset Size after the augmentations is: ', len(augmented_images_df))

Our total dataset size before the augmentations was:  218
Our total dataset size after the augmentations is:  654

Step 3: Preprocess the Dataset

Now we will start converting class labels to one-hot encoded vectors and the bounding boxes will be scaled to range 0-1 because it’s easier for the network to predict values in a fixed range for each image. Plus we will be using the sigmoid activation function for the localization/regression head and sigmoid outputs values between 0-1.

We will also preprocess the image by resizing it to a fixed size, converting it to RGB channel format, and normalizing it by dividing it by 255.0. This will help the network in training.

def preprocess_dataset(image_folder, classes_list, df, image_size = 300,):

  # Lists that will contain the whole dataset
  labels = []
  boxes = []
  img_list = []

  # Get height and width of each image in the datafame
  h = df['height']
  w = df['width']

  # Create a copy of the labels in the dataframe
  labels = list(df['class'])

  # Create a copy of the bounding box values and also normalize them 
  for x1, y1, x2, y2 in zip(list(df['xmin']/w), list(df['ymin']/h),list(df['xmax']/w), list(df['ymax']/h)):
    
    arr = [x1, y1, x2, y2]
    boxes.append(arr)

  # We loop over each class and its labels 
  for class_folder in classes_list:  

    # Set our images directory
    image_dir = os.path.join(image_folder, class_folder)

    # Annotation and Image files
    img_files = sorted(os.listdir(image_dir))

    # Loop over each of the image and its label
    for image_file in img_files:

      # Full path Image
      img_path = os.path.join(image_dir, image_file)

      # Read the image
      img  = cv2.imread(img_path)

      # Resize all images to a fix size
      image = cv2.resize(img, (image_size, image_size))

      # Convert the image from BGR to RGB as NasNetMobile was trained on RGB images
      image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

      # Normalize the image by dividing it by 255.0 
      image = image.astype("float") / 255.0

      # Append it to the list of images
      img_list.append(image)

  return labels, boxes, img_list

Now call the preprocessing function.

# All images will resized to 300, 300 
image_size = 300

# Get Augmented images and bounding boxes
labels, boxes, img_list = preprocess_dataset('aug_images', classes_list, augmented_images_df)

We will shuffle the data after preprocessing is completed.

# Convert labels to integers, then one hot encode them
label_encoder = LabelEncoder()
integer_labels = label_encoder.fit_transform(labels)
onehot_labels = to_categorical(integer_labels)

# Now we need to shuffle the data, so zip all lists and shuffle
combined_list = list(zip(img_list, boxes, onehot_labels))
random.shuffle(combined_list)

# Extract back the contents of each list
img_list, boxes, onehot_labels = zip(*combined_list)

print('All Done')

Step 4: Visualize the Data with its Annotations

Here we will pick some random images from each class and draw the associated bounding boxes over it. This way you’ll be able to visualize the data and make sure that your preprocessing steps were performed correctly.

# Create a Matplotlib figure
plt.figure(figsize=(20,20));

# Generate a random sample of images each time the cell is run 
random_range = random.sample(range(1, len(img_list)), 20)

for iteration, i in enumerate(random_range, 1):

    # Bounding box of each image
    a1, b1, a2, b2 = boxes[i];

    # Rescaling the boundig box values to match the image size
    x1 = a1 * image_size
    x2 = a2 * image_size
    y1 = b1 * image_size
    y2 = b2 * image_size

    # The image to visualize
    image = img_list[i]

    # Draw bounding boxes on the image
    cv2.rectangle(image, (int(x1),int(y1)),
          (int(x2),int(y2)),(0,255,0),3);
    
    # Clip the values to 0-1 and draw the sample of images
    image = np.clip(img_list[i], 0, 1)
    plt.subplot(4, 5, iteration);
    plt.imshow(image);
    plt.axis('off');

Step 5: Split the Data into Train and Validation Set

Now we would have 3 lists; one list containing all images, the second one contains all class labels in one-hot encoded format, and the third one list contains scaled bounding box coordinates. Let’s split our data to create a training and validation set. It’s important to shuffle your data before the split which we have already done.

# Split the data of images, labels and their annotations
train_images, val_images, train_labels, \
val_labels, train_boxes, val_boxes = train_test_split( np.array(img_list), 
                np.array(onehot_labels), np.array(boxes), test_size = 0.1,random_state = 43)

print('Total Training Images: {}, Total Test Images: {}'.format(
    len(train_images), 
    len(val_images)
    ))

Total Training Images: 392, Total Test Images: 44

Step 6: Construct the Multi-Output Model

Now it’s time to create our Multi-Output model. We won’t be training from scratch but we will use transfer learning. The model we will use is NasnetMobile. This is a really efficient model with a good balance of speed and accuracy, It was created through Neural Architecture Search (NAS) which is an emerging field of AutoML.

We’ll first download the model but exclude the top since we’ll be adding our own custom top.

# Load the NasNetMobile Model, make sure to exclude the top for transfer learning
N_mobile = tf.keras.applications.NASNetMobile( input_tensor = Input(
    shape=(image_size, image_size, 3)), 
    include_top=False, 
    weights='imagenet'
    )

# Let's create a function that will construct our model
def create_model(no_of_classes):

    # Freeze the whole model
    N_mobile.trainable = False
    
    # Start by taking the output feature maps from NASNETMobile
    base_model_output = N_mobile.output
    
    # Convert to a single-dimensional vector by Global Average Pooling.

    # We could also use Flatten()(x) but GAP is more effective, it reduces 
    # Parameters and controls overfitting.
    flattened_output = GlobalAveragePooling2D()(base_model_output)

    # Create our Classification Head, final layer contains 
    # Ouput units = no. classes
    class_prediction = Dense(256, activation="relu")(flattened_output)
    class_prediction = Dense(128, activation="relu")(class_prediction )
    class_prediction = Dropout(0.2)(class_prediction)
    class_prediction = Dense(64, activation="relu")(class_prediction)
    class_prediction = Dropout(0.2)(class_prediction )
    class_prediction = Dense(32, activation="relu")(class_prediction)
    class_prediction = Dense(no_of_classes, activation='softmax',name="class_output")(class_prediction)

    # Create Our Localization Head, final layer contains 4 nodes for x1,y1,x2,y2
    # Respectively.
    box_output = Dense(256, activation="relu")(flattened_output)
    box_output = Dense(128, activation="relu")(box_output)
    box_output = Dropout(0.2)(box_output )

    box_output = Dense(64, activation="relu")(box_output)
    box_output = Dropout(0.2)(box_output )

    box_output = Dense(32, activation="relu")(box_output)
    box_predictions = Dense(4, activation='sigmoid',
                            name= "box_output")(box_output)

    # Now combine the two heads
    model = Model(inputs=N_mobile.input, outputs= [box_predictions, class_prediction])

    return model

# Create the model for 3 classes, Elephant, Butterfly, Cougar-Face
model = create_model(3)
print("Model Created")

Check Model’s Structure:

Using the plot_model function you can check the structure of the final model. This is really helpful when you are creating a complex network and you want to make sure you have constructed the network correctly.

plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

When you run the above code, you will get the graph image below. This plot will also be saved to your disk as model_plot.png.

**You can download the full resolution image here**

Don’t worry about deciphering the whole network. We started the deep learning explosion with networks like Alexnet and then kept innovating with models like VGG, inception, Resnet, and other complex residual architectures every year. We also innovated and created new layer operations like BatchNormalization to combat overfitting and achieve faster training, and so created Dropout, Separable Convolutions, and many more.

Today we’re living in a time where architectures created from Neural Architecture Search achieve state of the art levels in accuracy and performance.

Here’s the expanded part of the graph that we created.

So upon looking at the above graph we can say that the network is structured as intended. We are now ready to proceed to the next step.

Step 7: Compile & Train the Model

Now here are a couple of things that you need to do before we compile and train the model. In case of a single output model you just have to define a loss, a metric, and an optimizer and compile the model. Since we’re dealing with a multi-output model that outputs two totally different things, we need to use separate loss and separate metrics for each these output branches.

Remember when we were creating the model we actually assigned names to our two output branches. By doing so, we can easily refer to the name of an output branch and then assign it a specific loss or metric – just like a python dictionary.

# Here for each head we will define a different loss, we will define it 
# Like a dictionary.

# For classification we will have cateogirical crossentropy
# For the bouding boxes we will have mean squared error
losses = { 
    "box_output": "mean_squared_error",
    "class_output": "categorical_crossentropy"
    }

# Here you can give more or less weightage to each loss. 

# If you think that detection is harder then the classification then you can 
# Try assinging it more weight
loss_weights = {
    "box_output": 1.0, 
    "class_output": 1.0
    }

# Set the Metrics
# For the class labels we want to know the Accuracy
# And for the bounding boxes we need to know the Mean squared error
metrics = {
    'class_output': 'accuracy', 
    'box_output':  'mse'
    }

# We will be using early stopping to stop the model if total val loss does not
# Decrease by 0.001 in 40 epochs
stop = EarlyStopping(monitor = "val_loss", min_delta = 0.0001, patience = 40, 
                    restore_best_weights = True
                     )

# Change the learning rate according to number of epochs to boost learning
reduce_lr = ReduceLROnPlateau(monitor = "val_loss", factor = 0.0002, 
                              patience = 30, min_lr = 1e-7, verbose = 1)

# Initialize Optimizer
opt = SGD(lr = 1e-3, momentum = 0.9)

# Compile the model with Adam optimizer
model.compile(optimizer = opt, loss = losses, loss_weights = loss_weights, 
    metrics = metrics)

Start Training:

When you’re dealing with a multi-output model, you need to assign individual labels to branches separately like a dictionary.

# Train the Model
history = model.fit(x = train_images, 
                    y= {
                        "box_output": train_boxes, 
                        "class_output": train_labels
                        }, 
                    validation_data=(
                        val_images, 
                        {
                          "box_output": val_boxes, 
                          "class_output": val_labels
                          }), batch_size = 32, epochs = 500, 
                    callbacks=[reduce_lr, stop])

Step 8: Plot Model’s Loss & Accuracy Curves

Now we had separate losses and separate metrics for different branches. On top of that we used a validation set. We are now dealing with 10 total metrics. Let’s visualize and understand these by plotting them.

def plot(var1, var2, plot_name):
  # Get the loss metrics from the trained model
  c1 = history.history[var1]
  c2 = history.history[var2]

  epochs = range(len(c1))
  
  # Plot the metrics
  plt.plot(epochs, c1, 'b', label=var1)
  plt.plot(epochs, c2, 'r', label=var2)
  plt.title(str(plot_name))
  plt.legend()

plot( 'class_output_accuracy', 'val_class_output_accuracy', 'Class Output Accuracy vs Class Validation Accuracy')

plot( 'class_output_loss', 'val_class_output_loss', 'Class Output Loss vs Class Validation Loss')

plot( 'box_output_mse', 'val_box_output_mse', 'Box Output MSE vs Box Validation MSE')

plot('box_output_loss', 'val_box_output_loss', 'Box Output Loss vs Box Validation Loss')

# This is the most important metric, you can take just take a look at this
plot('loss','val_loss', ' Total Loss vs Total Validation Loss')

Code to save your model

# Save your model here in .h5 format.
model.save('caltech.h5')

# Load the saved model
# model = load_model('caltech.h5')

Step 9: Test your model on new images

In this step, we will create a function that will use the model to predict new images.

# Enter your class names in this list
global label_names

# Must be same as the Annotations list we used
label_names = sorted(classes_list)

Predict Function

Here we will create a function that will use the model for prediction with new images input.

# We will use this function to make prediction on images.
def predict(image, returnimage = False,  scale = 0.9):
  
  # Before we can make a prediction we need to preprocess the image.
  processed_image = preprocess(image)

  # Now we can use our model for prediction
  results = model.predict(processed_image)

  # Now we need to postprocess these results.
  # After postprocessing, we can easily use our results
  label, (x1, y1, x2, y2), confidence = postprocess(image, results)

  # Now annotate the image
  cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 100), 2)
  cv2.putText(
      image, 
      '{}'.format(label, confidence), 
      (x1, y2 + int(35 * scale)), 
      cv2.FONT_HERSHEY_COMPLEX, scale,
      (200, 55, 100),
      2
      )

  # Show the Image with matplotlib
  plt.figure(figsize=(10,10))
  plt.imshow(image[:,:,::-1])

Preprocessing Funciton

This function will preprocess new images the same way we did initially, before training.

# This function will preprocess images.
def preprocess(img, image_size = 300):
  
    image = cv2.resize(img, (image_size, image_size))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image.astype("float") / 255.0 

    # Expand dimensions as predict expect image in batches
    image = np.expand_dims(image, axis=0) 
    return image

Post Processing Function

After the prediction, you need postprocessing to extract the class label and the real bounding box coordinates.

def postprocess(image, results):

    # Split the results into class probabilities and box coordinates
    bounding_box, class_probs = results

    # First let's get the class label

    # The index of class with the highest confidence is our target class
    class_index = np.argmax(class_probs)
  
    # Use this index to get the class name.
    class_label = label_names[class_index]

    # Now you can extract the bounding box too.

    # Get the height and width of the actual image
    h, w = image.shape[:2]

    # Extract the Coordinates
    x1, y1, x2, y2 = bounding_box[0]

    # Convert the coordinates from relative (i.e. 0-1) to actual values
    x1 = int(w * x1)
    x2 = int(w * x2)
    y1 = int(h * y1)
    y2 = int(h * y2)

    # return the lable and coordinates
    return class_label, (x1,y1,x2,y2),class_probs

Now that we have created all the required functions, let’s test our model on new images.

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

!wget -q -O  elephant.jpg https://scx2.b-cdn.net/gfx/news/2020/zoomthebabye.jpg
image = cv2.imread('/content/elephant.jpg' )
predict(image, scale = 1)

!wget -q -O  butterfly.jpg https://www.sciencemag.org/butterfly.jpg
image = cv2.imread('/content/butterfly.jpg' )
predict(image)

!wget -q -O  elephant2.jpg https://elephants.com/elephant.jpg
image = cv2.imread('/content/elephant2.jpg' )
predict(image)

!wget -q -O  butterfly2.jpg https://i.pinimg.com/butterfly.jpg
image = cv2.imread('/content/butterfly2.jpg' )
predict(image)

!wget -q -O  cougar2.jpg https://encrypted-tbn0.gstatic.com/image.jpg
image = cv2.imread('/content/cougar2.jpg' )
predict(image, scale = 0.5)

!wget -q -O  butterfly3.jpg https://www.naturepl.com/cache/pcache2/00525060.jpg
image = cv2.imread('/content/butterfly3.jpg' )
predict(image)

Summary & Conclusion:

In this tutorial we learned

there is another problem that lies between image classification and object detection called Classification with Localization.
how to perform different kinds of augmentations.
how to create a multi-output model to output two totally different features.
how to convert any image classifier to a detector just by adding an extra head.
how to work with multiple losses and metrics and evaluate them by creating plots.

Here are some drawbacks of the discussed approach.

The approach is limited by the fact that there should only be one object present per image.
For the bounding box, we are monitoring Mean squared error which helps but is not the most useful loss to monitor. Ideally, we need to measure the Intersection over Union (IOU) score.
This setup is, at best is a hack for localization. I would strongly recommend training a SoTA Object Detector using TFOD API for serious localization problems. This gives comparatively better performance with proper Object Detection models.

If you enjoyed this tutorial then be sure to drop a comment.
We would love to hear from you. Ask any questions you want, we would be happy to help!

Classification with Localization: Convert any Keras Classifier to a Detector

Classification With Localization (Keras Code):

Step 1: Downloading the Detection Dataset

Step 2: Perform Data Augmentation with imgaug library

Convert MAT Labels into CSV

Function to Extract MAT attributes:

Initialize the Augmentations

Create a Function to Apply Augmentations

Step 3: Preprocess the Dataset

Step 4: Visualize the Data with its Annotations

Step 5: Split the Data into Train and Validation Set

Step 6: Construct the Multi-Output Model

Check Model’s Structure:

Step 7: Compile & Train the Model

Start Training:

Step 8: Plot Model’s Loss & Accuracy Curves

Code to save your model

Step 9: Test your model on new images

Predict Function

Preprocessing Funciton

Post Processing Function

Summary & Conclusion:

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

Classification With Localization (Keras Code):

Step 1: Downloading the Detection Dataset

Step 2: Perform Data Augmentation with imgaug library

Convert MAT Labels into CSV

Function to Extract MAT attributes:

Initialize the Augmentations

Create a Function to Apply Augmentations

Step 3: Preprocess the Dataset

Step 4: Visualize the Data with its Annotations

Step 5: Split the Data into Train and Validation Set

Step 6: Construct the Multi-Output Model

Check Model’s Structure:

Step 7: Compile & Train the Model

Start Training:

Step 8: Plot Model’s Loss & Accuracy Curves

Code to save your model

Step 9: Test your model on new images

Predict Function

Preprocessing Funciton

Post Processing Function

Summary & Conclusion:

Subscribe & Download Code

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?