• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Learn OpenCV

OpenCV, PyTorch, Keras, Tensorflow examples and tutorials

  • Home
  • Getting Started
    • Installation
    • PyTorch
    • Keras & Tensorflow
    • Resource Guide
  • Courses
    • Opencv Courses
    • CV4Faces (Old)
  • Resources
  • AI Consulting
  • About

CNN Fully Convolutional Image Classification with TensorFlow

Anastasia Murzova
July 13, 2020 Leave a Comment
Deep Learning Feature Detection Image Classification Image Processing Keras Object Detection Tensorflow

July 13, 2020 By Leave a Comment

In a previous post, we had covered the concept of fully convolutional neural networks (FCN) in PyTorch, where we showed how we can solve the classification task using the input image of arbitrary size.

We received several requests for the same post in Tensorflow (TF). By popular demand, in this post we implement the concept using TF

Download code

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Download Code

TensorFlow Fully Convolutional Neural Network

Let’s start with a brief recap of what Fully Convolutional Neural Networks are.

Fully connected layers (FC) impose restrictions on the size of model inputs. If you have used classification networks, you probably know that you have to resize and/or crop the image to a fixed size (e.g. 224×224).

To feed an arbitrary-sized image into the network we need to replace all FC layers with convolutional layers, which do not require a fixed input size.

In the previous fully convolutional network implementation we used a pre-trained PyTorch ResNet-18 network as a baseline for its further modification into a fully convolutional network.

Fully Convolutional ResNet-50

We wanted to replicate the above implementation inTensorflow. However, ResNet-18 is not available in TensorFlow as tensorflow.keras.applications contains pre-trained ResNet models starting with a 50-layer version of ResNet. That’s why in the current post we will experiment with ResNet-50.

This network expects an input image of size 224×224×3. Before we start the ResNet-50 transformation into a fully convolutional network, let’s review its architecture.

Each ResNet-50 block is 3-layer deep, whereas ResNet-18 blocks are 2-layer deep.

Figure 1: Overview of the ResNet-18 and ResNet-50 blocks

You can see in Figure 1, the first layer in the ResNet-50 architecture is convolutional, which is followed by a pooling layer or MaxPooling2D in the TensorFlow implementation (see the code below). This, in turn, is followed by 4 convolutional blocks containing 3, 4, 6 and 3 convolutional layers.

Finally, we have a global average pooling layer called as GlobalAveragePooling2D in the code. The output of this layer is flattened and fed to the final fully connected layer denoted by Dense. However, there is also another option in TensorFlow ResNet50 implementation regulated by its parameter include_top. When it is set to True, which is the default behaviour, our model keeps the last fully connected layer. If we set this value to False the last fully connected layer will be excluded. Another parameter such as pooling, can be used in case, when include_top is set to False. If pooling is None the model will return the output from the last convolutional block, if it is avg then global average pooling will be applied to the output, and if it is set to max – global max pooling will be used instead.

The below code was snipped from the resnet50.py file – the ResNet-50 realization in TensorFlow adapted from tf.keras.applications.ResNet50. You can compare its architecture with the table above.

# ResNet50 initial function
def ResNet50(include_top=True,
             weights='imagenet',
             input_tensor=None,
             input_shape=None,
             pooling=None,
             classes=1000,
             **kwargs):
  """Instantiates the ResNet50 architecture."""

  def stack_fn(x):
    x = stack1(x, 64, 3, stride1=1, name='conv2')
    x = stack1(x, 128, 4, name='conv3')
    x = stack1(x, 256, 6, name='conv4')
    return stack1(x, 512, 3, name='conv5')

  return ResNet(stack_fn, False, True, 'resnet50', include_top, 
      weights, input_tensor, input_shape, pooling, classes, **kwargs)

# TF ResNet basic pipeline: ResNet50 case
def ResNet(stack_fn,
           preact,
           use_bias,
           model_name='resnet',
           include_top=True,
           weights='imagenet',
           input_tensor=None,
           input_shape=None,
           pooling=None,
           classes=1000,
           classifier_activation='softmax',
           **kwargs):
    # ...
    x = layers.ZeroPadding2D(
        padding=((3, 3), (3, 3)),
        name='conv1_pad'
    )(img_input)
    x = layers.Conv2D(64, 7, strides=2, use_bias=use_bias, 
        name='conv1_conv')(x)

    x = layers.BatchNormalization(
        axis=bn_axis,
        epsilon=1.001e-5,
        name='conv1_bn'
    )(x)
    x = layers.Activation('relu', name='conv1_relu')(x)

    x = layers.ZeroPadding2D(
        padding=((1, 1), (1, 1)),
        name='pool1_pad'
    )(x)
    x = layers.MaxPooling2D(3, strides=2, name='pool1_pool')(x)

    # residual stacked block sequence
    x = stack_fn(x)

    x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
    imagenet_utils.validate_activation(
        classifier_activation,
        weights
    )
    x = layers.Dense(
        classes,
        activation=classifier_activation,
        name='predictions'
    )(x)
    # ...

Now we are going to create a new FullyConvolutionalResnet50 function as the baseline for further receptive field calculation:

def fully_convolutional_resnet50(
    input_shape, num_classes=1000, pretrained_resnet=True, 
    use_bias=True,
):
    # init input layer
    img_input = Input(shape=input_shape)

    # define basic model pipeline
    x = ZeroPadding2D(padding=((3, 3), (3, 3)),
        name="conv1_pad")(img_input)
    x = Conv2D(64, 7, strides=2, use_bias=use_bias,
        name="conv1_conv")(x)
    x = BatchNormalization(axis=3, epsilon=1.001e-5,
        name="conv1_bn")(x)
    x = Activation("relu", name="conv1_relu")(x)

    x = ZeroPadding2D(padding=((1, 1), (1, 1)),
        name="pool1_pad")(x)
    x = MaxPooling2D(3, strides=2, name="pool1_pool")(x)

    # the sequence of stacked residual blocks
    x = stack1(x, 64, 3, stride1=1, name="conv2")
    x = stack1(x, 128, 4, name="conv3")
    x = stack1(x, 256, 6, name="conv4")
    x = stack1(x, 512, 3, name="conv5")

    # add avg pooling layer after feature extraction layers
    x = AveragePooling2D(pool_size=7)(x)

    # add final convolutional layer
    conv_layer_final = Conv2D(
        filters=num_classes,
        kernel_size=1,
        use_bias=use_bias,
        name="last_conv",
    )(x)

    # configure fully convolutional ResNet50 model
    model = training.Model(img_input, x)

    # load model weights
    if pretrained_resnet:
        model_name = "resnet50"
        # configure full file name
        file_name = model_name + 
            "_weights_tf_dim_ordering_tf_kernels_notop.h5"
        # get the file hash from TF WEIGHTS_HASHES
        file_hash = WEIGHTS_HASHES[model_name][1]
        weights_path = data_utils.get_file(
            file_name,
            BASE_WEIGHTS_PATH + file_name,
            cache_subdir="models",
            file_hash=file_hash,
        )

        model.load_weights(weights_path)

    # form final model
    model = training.Model(inputs=model.input, outputs= 
        [conv_layer_final])

    if pretrained_resnet:
        # get model with the dense layer for further FC weights 
        extraction
        resnet50_extractor = ResNet50(
            include_top=True,
            weights="imagenet",
            classes=num_classes,
        )
        # set ResNet50 FC-layer weights to final conv layer
        set_conv_weights(
            model=model,
            feature_extractor=resnet50_extractor
        )
    return model

It’s worth noting that the FC layer was converted to the convolutional layer by copying weights and biases from the TF ResNet50 last Dense layer. This process is shown below:

# setting FC weights to the final convolutional layer
def set_conv_weights(model, feature_extractor):
    # get pre-trained ResNet50 FC weights
    dense_layer_weights = feature_extractor.layers[-1].get_weights()
    weights_list = [
        tf.reshape(
            dense_layer_weights[0], (1, 1, *dense_layer_weights[0].shape),
        ).numpy(),
        dense_layer_weights[1],
    ]
    model.get_layer(name="last_conv").set_weights(weights_list)

TensorFlow Fully Convolutional Network Results

Let’s check model predictions on a previously used camel input image.

Figure 2: Model input image

The first step is image reading and initial preprocessing:

# read image
    original_image = cv2.imread("camel.jpg")

    # convert image to the RGB format
    image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

    # pre-process image
    image = preprocess_input(image)

    # convert image to NCHW tf.tensor
    image = tf.expand_dims(image, 0)

    # load modified pre-trained resnet50 model
    model = fully_convolutional_resnet50(
        input_shape=(image.shape[-3:])
    )

We use preprocess_input function to get the proper image input, that was used to train the original model. What it actually does is simply subtracting the mean pixel value [103.939, 116.779, 123.68] from each pixel:

Figure 3: Preprocessed image

Now all we have to do is to forward pass our input and post-process the input to obtain the response map:

# load modified resnet50 model with pre-trained ImageNet weights
    model = fully_convolutional_resnet50(
        input_shape=(image.shape[-3:])
    )

    # Perform inference.
    # Instead of a 1×1000 vector, we will get a
    # 1×1000×n×m output ( i.e. a probability map
    # of size n × m for each 1000 class,
    # where n and m depend on the size of the image).
    preds = model.predict(image)
    preds = tf.transpose(preds, perm=[0, 3, 1, 2])
    preds = tf.nn.softmax(preds, axis=1)
    print("Response map shape : ", preds.shape)

    # find the class with the maximum score in the n × m output map
    pred = tf.math.reduce_max(preds, axis=1)
    class_idx = tf.math.argmax(preds, axis=1)
    print(class_idx)

    row_max = tf.math.reduce_max(pred, axis=1)
    row_idx = tf.math.argmax(pred, axis=1)

    col_idx = tf.math.argmax(row_max, axis=1)

    predicted_class = tf.gather_nd(
        class_idx,
        (0, tf.gather_nd(row_idx, (0, col_idx[0])), col_idx[0]),
    )

    # print top predicted class
    print(
        "Predicted Class : ",
        labels[predicted_class], 
        predicted_class
    )

After running the code above, we will receive the following output:

Response map shape :  (1, 1000, 3, 8)
tf.Tensor(
[[[978 437 437 437 437 978 354 975]
  [978 354 354 354 354 354 354 735]
  [978 977 977 977 977 273 354 354]]], shape=(1, 3, 8), dtype=int64)
Predicted Class :  Arabian camel, dromedary, Camelus dromedarius tf.Tensor(354, shape=(), dtype=int64)

The initial size of the forward passed through the network image was 1920×725×3. As an output we received a response map of size [1, 1000, 3, 8], where 1000 is the number of classes. As we remember from the previous post, the result can be interpreted as the inference performed on 3 × 8 = 24 locations on the image by obtained sliding window of size 224×224 (the input image size for the original network).

In the predicted class line the value of 354 depicts the number of the predicted imagenet class: ‘Arabian camel’ (354). The visualization of model results:

Figure 4: Response map

The response map depicts the regions of a high likelihood of the predicted class. Notice, that the strongest response is in the camel area, which, however, comes along with the response in the region of pyramids.

In the final stage the area with the highest response was highlighted with a detection box, created by thresholding the obtained response map:

score_map = cv2.cvtColor(score_map, cv2.COLOR_GRAY2BGR)
masked_image = (original_image * score_map).astype(np.uint8)

# display bounding box
cv2.rectangle(
    masked_image, rect[:2], (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 255), 2,
)

The output is:

Figure 5: Fully convolutional ResNet50 results

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now


Tags: classification fully convolutional Fully Convolutional Network (FCN) Image Classification imageNet Keras resnet50 Tensorflow

Filed Under: Deep Learning, Feature Detection, Image Classification, Image Processing, Keras, Object Detection, Tensorflow

About

I am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field.

In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. Read More…

Getting Started

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

Resources

Download Code (C++ / Python)

ENROLL IN OFFICIAL OPENCV COURSES

I've partnered with OpenCV.org to bring you official courses in Computer Vision, Machine Learning, and AI.
Learn More

Recent Posts

  • RAFT: Optical Flow estimation using Deep Learning
  • Making A Low-Cost Stereo Camera Using OpenCV
  • Optical Flow in OpenCV (C++/Python)
  • Introduction to Epipolar Geometry and Stereo Vision
  • Depth Estimation using Stereo matching

Disclaimer

All views expressed on this site are my own and do not represent the opinions of OpenCV.org or any entity whatsoever with which I have been, am now, or will be affiliated.

GETTING STARTED

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

COURSES

  • Opencv Courses
  • CV4Faces (Old)

COPYRIGHT © 2020 - BIG VISION LLC

Privacy Policy | Terms & Conditions

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.AcceptPrivacy policy