Change input shape dimensions for fine-tuning with Keras

On this tutorial, you will discover ways to change the input shape tensor dimensions for fine-tuning utilizing Keras. After going by way of this guide you’ll understand how you can apply transfer learning to pictures with totally different image dimensions than what the CNN was originally educated on.

A couple of weeks ago I revealed a tutorial on switch studying with Keras and deep studying — soon after the tutorial was revealed, I acquired a query from Francesca Maepa who asked the next:

Have you learnt of a superb blog or tutorial that exhibits tips on how to implement transfer learning on a dataset that has a smaller shape than the pre-trained mannequin?

I created a very good pre-trained mannequin, and want to use some features for the pre-trained model and switch them to a goal domain that’s missing certain function coaching datasets and I’m unsure if I’m doing it right.

Francesca asks an amazing question.

Sometimes we think of Convolutional Neural Networks as accepting fastened measurement inputs (i.e., 224×224, 227×227, 299×299, and so forth.).

However what for those who needed to:

  1. Utilize a pre-trained community for switch studying…
  2. …after which replace the input shape dimensions to simply accept pictures with totally different dimensions than what the unique network was educated on?

Why may you need to make the most of totally different picture dimensions?

There are two widespread causes:

  • Your input image dimensions are significantly smaller than what the CNN was educated on and growing their measurement introduces too many artifacts and dramatically hurts loss/accuracy.
  • Your photographs are excessive resolution and include small objects which might be exhausting to detect. Resizing to the original input dimensions of the CNN hurts accuracy and also you postulate growing decision will assist improve your mannequin.

In these situations, you’d want to update the input shape dimensions of the CNN and then have the ability to carry out switch studying.

The query then turns into, is such an replace attainable?

Sure, in reality, it is.

Change input shape dimensions for fine-tuning with Keras

In the first part of this tutorial, we’ll talk about the concept of an input shape tensor and the position it plays with input picture dimensions to a CNN.

From there we’ll talk about the example dataset we’ll be utilizing in this blog publish. I’ll then present you easy methods to:

  1. Replace the input picture dimensions to pre-trained CNN utilizing Keras.
  2. High-quality-tune the updated CNN. Let’s get began!

What’s an input shape tensor?

Figure 1: Convolutional Neural Networks built with Keras for deep studying have totally different input shape expectations. On this weblog publish, you’ll discover ways to change input shape dimensions for fine-tuning with Keras.

When working with Keras and deep studying, you’ve in all probability either utilized or run into code that masses a pre-trained network by way of:

The code above is initializing the VGG16 architecture and then loading the weights for the mannequin (pre-trained on ImageNet).

We might sometimes use this code when our challenge needs to classify input pictures that have class labels inside ImageNet (as this tutorial demonstrates).

When performing transfer learning or fine-tuning chances are you’ll use the next code to go away off the fully-connected (FC) layer heads:

We’re nonetheless indicating that the pre-trained ImageNet weights ought to be used, however now we’re setting
include_top=False , indicating that the FC head shouldn’t be loaded.

This code would sometimes be utilized if you’re performing switch studying both by way of function extraction or fine-tuning.

Finally, we will replace our code to include an input_tensor dimension:

We’re nonetheless loading VGG16 with weights pre-trained on ImageNet and we’re still leaving off the FC layer heads…but now we’re specifying an input shape of 224×224×three (which are the input picture dimensions that VGG16 was originally educated on, as seen in Determine 1, left).

That’s all positive and good — however what if we now needed to fine-tune our model on 128×128px photographs?

That’s truly just a easy replace to our model initialization:

Determine 1 (proper) supplies a visualization of the community updating the input tensor dimensions — discover how the input volume is now 128x128x3 (our updated, smaller dimensions) versus the previous 224x224x3 (the original, larger dimensions).

Updating the input shape dimensions of a CNN by way of Keras is that straightforward!

However there are a number of caveats to look out for.

Can I make the input dimensions something I would like?

Determine 2: Updating a Keras CNN’s input shape is simple; nevertheless, there are a number of caveats to take into accounts,

There are limits to how a lot you possibly can update the picture dimensions, each from an accuracy/loss perspective and from limitations of the network itself.

Contemplate the truth that CNNs scale back quantity dimensions by way of two strategies:

  1. Pooling (comparable to max-pooling in VGG16)
  2. Strided convolutions (reminiscent of in ResNet)

In case your input picture dimensions are too small then the CNN will naturally scale back quantity dimensions through the ahead propagation after which effectively “run out” of knowledge.

In that case your input dimensions are too small.

I’ve included an error of what occurs throughout that state of affairs under when, for example, when using 48×48 input pictures, I acquired this error message:

Notice how Keras is complaining that our volume is just too small. You’ll encounter comparable errors for different pre-trained networks as properly. If you see one of these error, you realize you’ll want to improve your input image dimensions.

You may also make your input dimensions too giant.

You gained’t run into any errors per se, but you may even see your network fail to obtain affordable accuracy as a consequence of the fact that there aren’t sufficient layers within the community to:

  1. Study strong, discriminative filters.
  2. Naturally scale back quantity measurement by way of pooling or strided convolution.

If that happens, you’ve got a number of choices:

  • Explore different (pre-trained) community architectures which might be educated on bigger input dimensions.
  • Tune your hyperparameters exhaustively, focusing first on learning price.
  • Add further layers to the community. For VGG16 you’ll use three×three CONV layers and max-pooling. For ResNet you’ll embrace residual layers with strided convolution.

The ultimate suggestion will require you to replace the community architecture after which carry out fine-tuning on the newly initialized layers.

To study extra about fine-tuning and and switch studying, along with my ideas, options, and greatest practices when coaching networks, be sure to seek advice from my guide, Deep Studying for Pc Imaginative and prescient with Python.

Our example dataset

Determine 3: A subset of the Kaggle Canine vs. Cats dataset is used for this Keras input shape example. Using a smaller dataset not only proves the point extra shortly, but in addition allows nearly any pc hardware to be used (i.e. no costly GPU machine/occasion mandatory).

The dataset we’ll be utilizing here in the present day is a small subset of Kaggle’s Canine vs. Cats dataset.

We also use this dataset inside Deep Studying for Pc Vision with Python to show the basics of coaching networks, making certain that readers with either CPUs or GPUs can comply with along and study greatest practices when coaching models.

The dataset itself incorporates 2,000 pictures belonging to 2 courses (“cat” and dog”):

  • Cat: 1,000 photographs
  • Dog: 1,000 photographs

A visualization of the dataset may be seen in Figure three above.

Within the remainder of this tutorial you’ll discover ways to take this dataset and:

  1. Replace the input shape dimensions for a pre-trained CNN.
  2. Positive-tune the CNN with the smaller image dimensions.

Putting in vital packages

All of at this time’s packages may be put in by way of pip.

I like to recommend that you simply create a Python digital setting for in the present day’s venture, however it isn’t essentially required. To discover ways to create a virtual setting shortly and to install OpenCV into it, seek advice from my pip set up opencv tutorial.

To put in the packages for at present’s undertaking, simply enter the following commands:

Venture construction

Go forward and grab the code + dataset from the “Downloads“ section of as we speak’s blog submit.

Once you’ve extracted the .zip archive, you could examine the challenge structure utilizing the
tree  command:

Our dataset is contained inside the
dogs_vs_cats_small/  listing. The 2 subdirectories include pictures of our courses. In case you’re working with a special dataset ensure the construction is
/ .

As we speak we’ll be reviewing the  script. The coaching script generates
plot.png  containing our accuracy/loss curves.

Updating the input shape dimensions with Keras

It’s now time to update our input image dimensions with Keras and a pre-trained CNN.

Open up the  file in your challenge construction and insert the next code:

Strains 2-20 import required packages:

  • keras  and
    sklearn  are for deep studying/machine learning. Make sure to discuss with my in depth deep studying guide, Deep Studying for Pc Vision with Python, to turn out to be more familiar with the courses and features we use from these instruments.

  • paths  from imutils traverses a listing and allows us to record all pictures in a listing.

  • matplotlib  will permit us to plot our training accuracy/loss historical past.

  • numpy  is a Python package deal for numerical operations; one of many ways we’ll put it to work is for “mean subtraction”, a scaling/normalization method.

  • cv2  is OpenCV.

  • argparse  shall be used to learn and parse command line arguments.

Let’s go forward and parse the command line arguments now:

Our script accepts three command line arguments by way of Strains 23-30:

  • –dataset : The path to our input dataset. We’re utilizing a condensed version of Canine vs. Cats, however you may use other binary, 2-class datasets with little or no modification as properly (offered they comply with an identical structure).

  • –epochs : The variety of occasions we’ll cross our knowledge via the network throughout training; by default, we’ll practice for
    25  epochs until a unique worth is provided.

  • –plot : The trail to our output accuracy/loss plot. Until in any other case specified, the file shall be named
    plot.png  and positioned within the venture directory. In case you are conducting multiple experiments, remember to give your plots a special identify each time for future comparison purposes.

Next, we’ll load and preprocess our photographs:

First, we seize our
imagePaths  on Line 35 after which initialize our
knowledge  and
labels  (Strains 36 and 37).

Strains 40-52 loop over the
imagePaths  whereas first extracting the labels. Every picture is loaded, the color channels are swapped, and the picture is resized. The pictures and labels are added to the
knowledge  and
labels  lists respectively.

VGG16 was educated on 224×224px pictures; nevertheless, I’d like to attract your attention to Line 48. Discover how we’ve resized our pictures to 128×128px. This resizing is an instance of making use of switch studying on photographs with totally different dimensions.

Though Line 48 doesn’t absolutely answer Francesca Maepa’s query yet, we’re getting shut.

Let’s go forward and one-hot encode our labels in addition to cut up our knowledge:

Strains 55 and 56 convert our
knowledge  and
labels  to NumPy array format.

Then, Strains 59-61 carry out one-hot encoding on our labels. Primarily, this course of converts our two labels (“cat” and “dog”) to arrays indicating which label is lively/scorching. If a training picture is consultant of a dog, then the value can be
[0, 1] where “dog” is scorching. In any other case, for a “cat”, the worth can be
[1, 0] .

To strengthen the purpose, if for instance, we had 5 courses of knowledge, a one-hot encoded array might seem like
[0, 0, 0, 1, 0] where the 4th factor is scorching indicating that the picture is from the 4th class. For additional details, please check with Deep Studying for Pc Vision with Python.

Strains 65 and 66 mark 75% of our knowledge for training and the remaining 25% for testing by way of the
train_test_split  perform.

Let’s now initialize our knowledge augmentation generator. We’ll also set up our ImageNet mean for mean subtraction:

Strains 69-76 initialize a knowledge augmentation object for performing random manipulations on our input photographs throughout coaching.

Line 80 additionally takes advantage of the
ImageDataGenerator  class for validation, however without any parameters — we gained’t manipulate validation photographs with the exception of performing imply subtraction.

Both coaching and validation/testing turbines will conduct mean subtraction. Imply subtraction is a scaling/normalization method confirmed to increase accuracy. Line 85 incorporates the imply for each respective RGB channel whereas Strains 86 and 87 are then populated with the worth. Later, our knowledge turbines will mechanically perform the mean subtraction on our training/validation knowledge.

Observe: I’ve coated knowledge augmentation in detail on this weblog submit in addition to in the Practitioner Bundle of Deep Studying for Pc Vision with Python. Scaling and normalization methods similar to mean subtraction are coated in DL4CV as nicely.

We’re performing switch learning with VGG16. Let’s initialize the base model now:

Strains 92 and 93 load
VGG16  with an input shape dimension of 128×128 utilizing 3 channels.

Keep in mind, VGG16 was initially educated on 224×224 photographs — now we’re updating the input shape dimensions to deal with 128×128 pictures.

Effectively, we’ve now absolutely answered Francesca Maepa’s question! We completed changing the input dimensions by way of two steps:

  1. We resized all of our input pictures to 128×128.
  2. Then we set the input
    shape=(128, 128, 3) .

Line 97 will print a mannequin abstract in our terminal in order that we will inspect it. Alternatively, you could visualize the mannequin graphically by learning Chapter 19 “Visualizing Network Architectures” of Deep Learning for Pc Imaginative and prescient with Python.

Since we’re performing switch studying, the
include_top  parameter is about to
False  (Line 92) — we chopped off the top!

Now we’re going to perform surgical procedure by erecting a brand new head and suturing it onto the CNN:

Line 101 takes the output from the
baseModel  and units it as input to the
headModel .

From there, Strains 102-106 construct the remainder of the top.

baseModel  is already initialized with ImageNet weights per Line 92. On Strains 114 and 115, we set the bottom layers in VGG16 as not trainable (i.e., they won’t be updated through the backpropagation part). You’ll want to learn my previous fine-tuning tutorial for further rationalization.

We’re now able to compile and practice the mannequin with our knowledge:

model  is compiled with the
Adam  optimizer and a
1e-Four  learning fee (Strains 120-122).

We use
“binary_crossentropy”  for 2-class classification. In case you have more than two courses of knowledge, make sure to use
“categorical_crossentropy” .

Strains 128-133 then practice our switch learning network. Our coaching and validation turbines are put to work within the course of.

Upon training completion, we’ll consider the community and plot the coaching historical past:

Strains 137-139 evaluate our
model  and print a classification report for statistical analysis.

We then employ
matplotlib  to plot our accuracy and loss history throughout training (Strains 142-152). The plot figure is saved to disk by way of Line 153.

High-quality-tuning a CNN utilizing the updated input dimensions

Figure 4: Altering Keras input shape dimensions for fine-tuning produced the next accuracy/loss training plot.

To fine-tune our CNN using the updated input dimensions first ensure you’ve used the “Downloads” section of this information to obtain the (1) source code and (2) example dataset.

From there, open up a terminal and execute the next command:

Our first set of output exhibits our updated input shape dimensions.

Discover how our
input_1 (i.e., the
InputLayer) has input dimensions of 128x128x3 versus the traditional 224x224x3 for VGG16.

The input image will then ahead propagate via the community until the final
MaxPooling2D  layer (i.e.,

At this point, our output volume has dimensions of 4x4x512 (for reference, VGG16 with a 224x224x3 input volume would have the shape 7x7x512 after this layer).

Notice: In case your input image dimensions are too small then you definitely danger the model, successfully, decreasing the tensor volume into “nothing” and then operating out of knowledge, leading to an error. See the “Can I make the input dimensions anything I want?” section of this publish for more particulars.

We then flatten that quantity and apply the FC layers from the
headModel , finally leading to our remaining classification.

As soon as our mannequin is constructed we will then fine-tune it:

On the finish of fine-tuning we see that our model has obtained 93% accuracy, respectable given our small picture dataset.

As Figure 4 demonstrates, our coaching can also be fairly secure as nicely with no indicators of overfitting.

More importantly, you now know how one can change the input image shape dimensions of a pre-trained network and then apply function extraction/fine-tuning utilizing Keras!

Make sure you use this tutorial as a template for each time it is advisable apply transfer studying to a pre-trained community with totally different image dimensions than what it was originally educated on.


In this tutorial, you discovered the right way to change input shape dimensions for fine-tuning with Keras.

We sometimes carry out such an operation once we need to apply switch learning, including both function extraction and fine-tuning.

Using the strategies in this information, you’ll be able to update your input image dimensions for your pre-trained CNN after which carry out transfer learning; nevertheless, there are two caveats you’ll want to look out for:

  1. In case your input pictures are too small, Keras will error out.
  2. If your input pictures are too giant, you might not acquire your desired accuracy.

You’ll want to confer with the “Can I make the input dimensions anything I want?” part of this submit for more details on these caveats, together with ideas on methods to remedy them.

I hope you enjoyed this tutorial!

To download the source code to this submit, and be notified when future tutorials are revealed right here on PyImageSearch, just enter your e-mail handle in the type under!