data augmentation

Keras ImageDataGenerator and Data Augmentation

In immediately’s tutorial, you will discover ways to use Keras’ ImageDataGenerator class to carry out knowledge augmentation. I’ll also dispel widespread confusions surrounding what knowledge augmentation is, why we use knowledge augmentation, and what it does/doesn’t do.

Figuring out that I was going to write down a tutorial on knowledge augmentation, two weekends ago I decided to have some fun and purposely submit a semi-trick question on my Twitter feed.

The question was simple — knowledge augmentation does which of the following?

  1. Adds extra training knowledge
  2. Replaces training knowledge
  3. Does both
  4. I don’t know

Listed here are the results:

Determine 1: My @PyImageSearch twitter poll on the concept of Data Augmentation.

Solely 5% of respondents answered this trick query “correctly” (no less than for those who’re utilizing Keras’ ImageDataGenerator class).

Once more, it’s a trick question in order that’s not exactly a fair evaluation, however here’s the deal:

Whereas the phrase “augment” means to make one thing “greater” or “increase” one thing (on this case, knowledge), the Keras ImageDataGenerator class truly works by:

  1. Accepting a batch of pictures used for training.
  2. Taking this batch and making use of a collection of random transformations to every picture within the batch (including random rotation, resizing, shearing, and so on.).
  3. Changing the original batch with the new, randomly reworked batch.
  4. Training the CNN on this randomly reworked batch (i.e., the original knowledge itself is just not used for training).

That’s proper — the Keras ImageDataGenerator class shouldn’t be an “additive” operation. It’s not taking the original knowledge, randomly reworking it, and then returning each the original knowledge and reworked knowledge.

As an alternative, the ImageDataGenerator accepts the unique knowledge, randomly transforms it, and returns only the new, reworked knowledge.

However keep in mind how I stated this was a trick query?

Technically, all the solutions are right — but the only means you recognize if a given definition of knowledge augmentation is right is by way of the context of its software.

I’ll provide help to clear up a number of the confusion relating to knowledge augmentation (and provide the context you want to efficiently apply it).

Inside the rest of right now’s tutorial you will:

  • Study three kinds of knowledge augmentation.
  • Dispel any confusion you will have surrounding knowledge augmentation.
  • Discover ways to apply knowledge augmentation with Keras and the
    ImageDataGenerator  class.

To study extra about knowledge augmentation, including using Keras’ ImageDataGenerator class, just maintain studying!

Keras ImageDataGenerator and Data Augmentation

We’ll start this tutorial with a discussion of knowledge augmentation and why we use it.

I’ll then cowl the three kinds of knowledge augmentation you’ll see when coaching deep neural networks:

  1. Dataset era and knowledge enlargement by way of knowledge augmentation (much less widespread)
  2. In-place/on-the-fly knowledge augmentation (commonest)
  3. Combining dataset era and in-place augmentation

From there I’ll train you tips on how to apply knowledge augmentation to your personal datasets (using all three methods) using Keras’
ImageDataGenerator  class.

What’s knowledge augmentation?

Data augmentation encompasses a wide range of methods used to generate “new” training samples from the unique ones by applying random jitters and perturbations (however on the similar time making certain that the class labels of the info aren’t changed).

Our aim when making use of knowledge augmentation is to extend the generalizability of the model.

Provided that our network is consistently seeing new, slightly modified versions of the input knowledge, the network is ready to study more strong options.

At testing time we don’t apply knowledge augmentation and simply evaluate our educated network on the unmodified testing knowledge — usually, you’ll see a rise in testing accuracy, perhaps on the expense of a slight dip in training accuracy.

A simple knowledge augmentation instance

Figure 2: Left: A sample of 250 knowledge points that comply with a traditional distribution exactly. Proper: Adding a small quantity of random “jitter” to the distribution. This kind of knowledge augmentation increases the generalizability of our networks.

Let’s think about Determine 2 (left) of a traditional distribution with zero imply and unit variance.

Coaching a machine studying mannequin on this knowledge might end in us modeling the distribution exactly — nevertheless, in real-world purposes, knowledge not often follows such a pleasant, neat distribution.

As an alternative, to extend the generalizability of our classifier, we might first randomly jitter factors along the distribution by including some random values epsilon drawn from a random distribution (right).

Our plot still follows an roughly normal distribution, nevertheless it’s not an ideal distribution as on the left.

A mannequin educated on this modified, augmented knowledge is extra more likely to generalize to example knowledge factors not included in the training set.

Pc vision and knowledge augmentation

Determine 3: In pc imaginative and prescient, knowledge augmentation performs random manipulations on pictures. It is sometimes applied in three situations discussed on this blog submit.

Within the context of pc imaginative and prescient, knowledge augmentation lends itself naturally.

For example, we will acquire augmented knowledge from the unique photographs by making use of simple geometric transforms, resembling random:

  1. Translations
  2. Rotations
  3. Modifications in scale
  4. Shearing
  5. Horizontal (and in some instances, vertical) flips

Applying a (small) amount of the transformations to an enter image will change its look slightly, however it does not change the class label — thereby making knowledge augmentation a really pure, straightforward technique to apply for pc imaginative and prescient duties.

Three varieties of knowledge augmentation

There are three forms of knowledge augmentation you’ll doubtless encounter when applying deep learning within the context of pc vision purposes.

Precisely which definition of knowledge augmentation is “correct” is completely depending on the context of your challenge/set of experiments.

Take the time to learn this part rigorously as I see many deep studying practitioners confuse what knowledge augmentation does and does not do.

Sort #1: Dataset era and increasing an present dataset (much less widespread)

Determine four: Sort #1 of knowledge augmentation consists of dataset era/dataset enlargement. This can be a much less widespread form of knowledge augmentation.

The primary sort of knowledge augmentation is what I name dataset era or dataset enlargement.

As you understand machine learning fashions, and especially neural networks, can require quite a bit of training knowledge — but what when you don’t have very much coaching knowledge in the first place?

Let’s look at probably the most trivial case the place you solely have one picture and you need to apply knowledge augmentation to create a whole dataset of pictures, all based mostly on that one picture.

To perform this process, you’d:

  1. Load the original enter image from disk.
  2. Randomly rework the original image by way of a collection of random translations, rotations, and so forth.
  3. Take the reworked image and write it back out to disk.
  4. Repeat steps 2 and three a total of N occasions.

After performing this course of you’d have a directory filled with randomly reworked “new” pictures that you might use for coaching, all based mostly on that single input image.

That is, in fact, an incredibly simplified instance.

You greater than probably have more than a single picture — you in all probability have 10s or 100s of pictures and now your objective is to show that smaller set into 1000s of photographs for coaching.

In those situations, dataset enlargement and dataset era could also be value exploring.

However there’s a problem with this strategy — we haven’t exactly elevated the power of our model to generalize.

Sure, we’ve got elevated our training knowledge by producing further examples, but all of these examples are based mostly on an excellent small dataset.

Needless to say our neural network is simply nearly as good as the info it was educated on.

We can’t anticipate to train a NN on a small quantity of knowledge and then anticipate it to generalize to knowledge it was never educated on and has by no means seen before.

If you find yourself significantly contemplating dataset era and dataset enlargement, you need to take a step back and as an alternative invest your time gathering further knowledge or wanting into strategies of behavioral cloning (and then applying the type of knowledge augmentation coated in the “Combining dataset generation and in-place augmentation” part under).

Sort #2: In-place/on-the-fly knowledge augmentation (commonest)

Determine 5: Sort #2 of knowledge augmentation consists of on-the-fly picture batch manipulations. This is the most typical form of knowledge augmentation with Keras.

The second sort of knowledge augmentation known as in-place knowledge augmentation or on-the-fly knowledge augmentation. One of these knowledge augmentation is what Keras’
ImageDataGenerator  class implements.

Utilizing this kind of knowledge augmentation we need to be sure that our community, when educated, sees new variations of our knowledge at each and every epoch.

Determine 5 demonstrates the process of making use of in-place knowledge augmentation:

  1. Step #1: An input batch of photographs is introduced to the
    ImageDataGenerator .
  2. Step #2: The
    ImageDataGenerator  transforms each image in the batch by a collection of random translations, rotations, and so forth.
  3. Step #3: The randomly reworked batch is then returned to the calling perform.

There are two essential points that I need to draw your consideration to:

  1. The
    ImageDataGenerator  just isn’t returning both the original knowledge and the reworked knowledge — the category solely returns the randomly reworked knowledge.
  2. We name this “in-place” and “on-the-fly” knowledge augmentation as a result of this augmentation is completed at coaching time (i.e., we aren’t producing these examples ahead of time/prior to training).

When our model is being educated, we will think of our
ImageDataGenerator  class as “intercepting” the original knowledge, randomly reworking it, and then returning it to the neural community for coaching, all of the whereas the NN has no concept the info was modified!

I’ve written earlier tutorials on the PyImageSearch weblog where readers assume that Keras’ ImageDateGenerator class is an “additive operation”, just like the following (incorrect) figure:

Determine 6: How Keras knowledge augmentation does not work.

In the above illustration the
ImageDataGenerator  accepts an enter batch of photographs, randomly transforms the batch, and then returns each the original batch and modified knowledge — again, this is not what the Keras
ImageDataGenerator  does. As an alternative, the
ImageDataGenerator  class will return just the randomly reworked knowledge.

Once I clarify this idea to readers the subsequent question is usually:

However Adrian, what concerning the unique training knowledge? Why is it not used? Isn’t the original coaching knowledge nonetheless useful for training?

Remember that the complete level of the info augmentation method described in this part is to ensure that the network sees “new” photographs that it has by no means “seen” before at each and each epoch.

If we included the original coaching knowledge together with the augmented knowledge in each batch, then the network would “see” the unique training knowledge multiple occasions, effectively defeating the purpose. Secondly, recall that the general aim of knowledge augmentation is to extend the generalizability of the mannequin.

To perform this objective we “replace” the training knowledge with randomly reworked, augmented knowledge.

In follow, this results in a mannequin that performs better on our validation/testing knowledge but perhaps performs barely worse on our training knowledge (to because of the variations in knowledge brought on by the random transforms).

You’ll discover ways to use the Keras
ImageDataGenerator  class later in this tutorial.

Sort #3: Combining dataset era and in-place augmentation

The ultimate sort of knowledge augmentation seeks to mix each dataset era and in-place augmentation — you may even see this kind of knowledge augmentation when performing behavioral cloning.

An ideal instance of behavioral cloning might be seen in self-driving automotive purposes.

Creating self-driving automotive datasets could be extraordinarily time consuming and costly — a method around the difficulty is to as an alternative use video video games and automotive driving simulators.

Video game graphics have turn out to be so life-like that it’s now potential to make use of them as training knowledge.

Subsequently, as an alternative of driving an precise car, you possibly can as an alternative:

  • Play a video game
  • Write a program to play a video game
  • Use the underlying rendering engine of the online game

…all to generate actual knowledge that can be utilized for training.

Upon getting your coaching knowledge you’ll be able to return and apply Sort #2 knowledge augmentation (i.e., in-place/on-the-fly knowledge augmentation) to the info you gathered by way of your simulation.

Challenge construction

Earlier than we dive into the code let’s first evaluation our directory structure for the undertaking:

First, there are two dataset directories which are not to be confused:

  • dogs_vs_cats_small/ : A subset of the popular Kaggle Canine vs. Cats competitors dataset. In my curated subset, only 2,00zero photographs (1,00zero per class) are present (as opposed to the 25,00zero photographs for the problem).

  • generated_dataset/ : We’ll create this generated dataset using the
    cat.jpg  and
    canine.jpg  pictures that are within the mum or dad directory. We’ll make the most of knowledge augmentation Sort #1 to generate this dataset mechanically and fill this directory with photographs.

Next, we now have our
pyimagesearch  module which accommodates our implementation of the ResNet CNN classifier.

As we speak we’ll assessment two Python scripts:

  • : Used to train fashions for each Sort #1 and Sort #2 (and optionally Sort #3 if the consumer so wishes) knowledge augmentation methods. We’ll perform three coaching experiments leading to every of the three
    plot*.png  information within the challenge folder.

  • : Used to generate a dataset from a single picture using Sort #1.

Let’s begin.

Implementing our coaching script

In the the rest of this tutorial we’ll be performing three experiments:

  1. Experiment #1: Generate a dataset by way of dataset enlargement and practice a CNN on it.
  2. Experiment #2: Use a subset of the Kaggle Canine vs. Cats dataset and practice a CNN with out knowledge augmentation.
  3. Experiment #three: Repeat the second experiment, however this time with knowledge augmentation.

All of those experiments can be completed utilizing the same Python script.

Open up the  script and let’s get began:

On Strains 2-18 our mandatory packages are imported. Line 10 is our
ImageDataGenerator  import from the Keras library — a class for knowledge augmentation.

Let’s go ahead and parse our command line arguments:

Our script accepts three command line arguments by way of the terminal:

  • –dataset : The trail to the enter dataset.

  • –increase : Whether or not “on-the-fly” knowledge augmentation must be used (check with sort #2 above). By default, this technique isn’t carried out.

  • –plot : The path to the output coaching history plot.

Let’s proceed to initialize hyperparameters and load our picture knowledge:

Coaching hyperparameters, including preliminary learning fee, batch measurement, and variety of epochs to train for, are initialized on Strains 32-34.

From there Strains 39-53 grab
imagePaths , load photographs, and populate our
knowledge  and
labels  lists. The one image preprocessing we carry out at this level is to resize each picture to 64×64px.

Subsequent, let’s end preprocessing, encode our labels, and partition our knowledge:

On Line 57, we convert knowledge to a NumPy array as well as scale all pixel intensities to the range [0, 1]. This completes our preprocessing.

From there we perform “one-hot encoding” of our
labels  (Strains 61-63). This technique of encoding our
labels  leads to an array which will appear to be this:

For this sample of knowledge, there are two cats (
[1., 0.] ) and 5 canine (
[0., 1] ) where the label comparable to the picture is marked as “hot”.

From there we partition our
knowledge  into coaching and testing splits marking 75% of our knowledge for coaching and the remaining 25% for testing (Strains 67 and 68).

Now, we are able to initialize our knowledge augmentation object:

Line 71 initializes our empty knowledge augmentation object (i.e., no augmentation might be carried out). That is the default operation of this script.

Let’s verify if we’re going to override the default with the
–increase  command line argument:

Line 75 checks to see if we’re performing knowledge augmentation. In that case, we re-initialize the info augmentation object with random transformation parameters (Strains 77-84). Because the parameters indicate, random rotations, zooms, shifts, shears, and flips might be carried out during in-place/on-the-fly knowledge augmentation.

Let’s compile and practice our model:

Strains 88-92 assemble our
ResNet  mannequin utilizing Stochastic Gradient Descent optimization and learning price decay. We use
“binary_crossentropy”  loss for this 2-class drawback. If in case you have more than two courses, you’ll want to use
“categorial_crossentropy” .

Strains 96-100 then practice our model. The
aug  object handles knowledge augmentation in batches (although make sure to recall that the
aug  object will only perform knowledge augmentation if the 
–increase  command line argument was set).

Lastly, we’ll evaluate our model, print statistics, and generate a training history plot:

Line 104 makes predictions on the check set for evaluation functions. A classification report is printed by way of Strains 105 and 106.

From there, Strains 109-120 generate and save an accuracy/loss coaching plot.

Generating a dataset/dataset enlargement with knowledge augmentation and Keras

In our first experiment, we’ll perform dataset enlargement by way of knowledge augmentation with Keras.

Our dataset will include 2 courses and initially, the dataset will trivially include just one picture per class:

  • Cat: 1 picture
  • Dog: 1 picture

We’ll utilize Sort #1 knowledge augmentation (see the “Type #1: Dataset generation and expanding an existing dataset” section above) to generate a brand new dataset with 100 photographs per class:

  • Cat: 100 pictures
  • Canine: 100 photographs

Once more, this meant to be an instance — in a real-world software you’d have 100s of instance pictures, however we’re retaining it easy right here so you’ll be able to study the concept.

Producing the example dataset

Determine 7: Data augmentation with Keras performs random manipulations on photographs.

Before we will practice our CNN we first have to generate an instance dataset.

From our “Project Structure” part above you already know that we’ve got two example photographs in our root directory:
cat.jpg and
canine.jpg. We’ll use these instance photographs to generate 100 new training photographs per class (200 pictures in complete).

To see how we will use knowledge augmentation to generate new examples, open up the  file and comply with alongside:

Strains 2-6 import our vital packages. Our
ImageDataGenerator  is imported on Line 2 and will deal with our knowledge augmentation with Keras.

From there, we’ll parse three command line arguments:

  • –image : The path to the enter picture. We’ll generate further random, mutated versions of this image.

  • –output : The trail to the output directory to store the info augmentation examples.

  • –complete : The variety of sample pictures to generate.

Let’s go forward and load our
picture  and initialize our knowledge augmentation object:

image  is loaded and ready for knowledge augmentation by way of Strains 21-23. Picture loading and processing is handled by way of Keras functionality (i.e. we aren’t using OpenCV).

From there, we initialize the
ImageDataGenerator  object. This object will facilitate performing random rotations, zooms, shifts, shears, and flips on our input picture.

Next, we’ll construct a Python generator and put it to work until all of our photographs have been produced:

We’ll use the
imageGen  to randomly rework the input image (Strains 39 and 40). This generator saves pictures as .jpg information to the required output directory contained inside
args[“output”] .

Lastly, we’ll loop over examples from our picture knowledge generator and rely them until we’ve reached the required
complete  number of pictures.

To run the  script ensure you have used the “Downloads” part of the tutorial to download the source code and instance photographs.

From there open up a terminal and execute the next command:

Examine the output of the
generated_dataset/cats  directory you’ll now see 100 pictures:

Let’s do the identical now for the “dogs” class:

And now examine for the canine pictures:

A visualization of the dataset era by way of knowledge augmentation could be seen in Determine 6 at the prime of this part — notice how we’ve got accepted a single enter picture (of me — not of a canine or cat) and then created 100 new coaching examples (48 of that are visualized) from that single image.

Experiment #1: Dataset era results

We at the moment are ready to perform our first experiment:

Determine eight: Data augmentation with Keras Experiment #1 coaching accuracy/loss results.

Our results present that we have been capable of get hold of 100% accuracy with little effort.

In fact, this can be a trivial, contrived example. In apply, you wouldn’t be taking only a single picture and then building a dataset of 100s or 1000s of pictures by way of knowledge augmentation. As an alternative, you’d have a dataset of 100s of pictures and you then would apply dataset era to that dataset — but again, the point of this section was to show on a simple example so you may understand the process.

Training a network with in-place knowledge augmentation

The extra well-liked type of (image-based) knowledge augmentation known as in-place knowledge augmentation (see the “Type #2: In-place/on-the-fly data augmentation” part of this publish for more details).

When performing in-place augmentation our Keras
ImageDataGenerator  will:

  1. Settle for a batch of enter photographs.
  2. Randomly rework the enter batch.
  3. Return the reworked batch to the network for training.

We’ll discover how knowledge augmentation can scale back overfitting and improve the power of our mannequin to generalize by way of two experiments.

To accomplish this process we’ll be using a subset of the Kaggle Canine vs. Cats dataset:

  • Cats: 1,000 photographs
  • Canine: 1,000 pictures

We’ll then practice a variation of ResNet, from scratch, on this dataset with and with out knowledge augmentation.

Experiment #2: Obtaining a baseline (no knowledge augmentation)

In our first experiment we’ll perform no knowledge augmentation:

Wanting on the uncooked classification report you’ll see that we’re acquiring 64% accuracy — however there’s a problem!

Take a look at the plot associated with our coaching:

Determine 9: For Experiment #2 we didn’t carry out knowledge augmentation. The result is a plot with robust indications of overfitting.

There’s dramatic overfitting occurring — at approximately epoch 15 we see our validation loss begin to rise whereas coaching loss continues to fall. By epoch 20 the rise in validation loss is particularly pronounced.

This sort of conduct is indicative of overfitting.

The solution is to (1) scale back model capacity, and/or (2) carry out regularization.

Experiment #three: Enhancing our outcomes (with knowledge augmentation)

Let’s now investigate how knowledge augmentation can act as a form of regularization:

We’re now as much as 68% accuracy, an increase from our previous 64% accuracy.

However extra importantly, we are not overfitting:

Figure 10: For Experiment #3, we carried out knowledge augmentation with Keras on batches of photographs in-place. Our coaching plot exhibits no signs of overfitting with this form of regularization.

Observe how validation and coaching loss are falling along with little divergence. Similarly, classification accuracy for both the coaching and validation splits are rising together as nicely.

Through the use of knowledge augmentation we have been capable of combat overfitting!

In almost all situations, until you might have excellent purpose to not, you ought to be performing knowledge augmentation when coaching your personal neural networks.

What’s next?

When you’d wish to study more about knowledge augmentation, together with:

  1. More particulars on the concept of knowledge augmentation.
  2. Easy methods to perform knowledge augmentation on your own datasets.
  3. Different forms of regularization to enhance your mannequin accuracy.
  4. My ideas/tips, options, and greatest practices for coaching CNNs.

…you then’ll undoubtedly need to check with Deep Studying for Pc Vision with Python.

Data augmentation is just one of many sixty-three chapters within the guide. You’ll additionally discover:

  • Tremendous sensible walkthroughs that present options to actual, real-world picture classification, object detection, and occasion segmentation issues.
  • Arms-on tutorials (with a lot of code) that not solely present you the algorithms behind deep studying for pc vision but their implementations as properly.
  • A no-nonsense educating type that is guaranteed that will help you master deep studying for picture classification, object detection, and segmentation.

To study more concerning the ebook, and to grab the table of contents + free pattern chapters, just click on here!


In this tutorial, you discovered about knowledge augmentation and methods to apply knowledge augmentation by way of Keras’ ImageDataGenerator class.

You also discovered about three kinds of knowledge augmentation, including:

  1. Dataset era and knowledge enlargement by way of knowledge augmentation (less widespread).
  2. In-place/on-the-fly knowledge augmentation (commonest).
  3. Combining the dataset generator and in-place augmentation.

By default, Keras’
ImageDataGenerator  class performs in-place/on-the-fly knowledge augmentation, which means that the class:

  1. Accepts a batch of pictures used for coaching.
  2. Takes this batch and applies a collection of random transformations to every image in the batch.
  3. Replaces the original batch with the new, randomly reworked batch
  4. 4. Trains the CNN on this randomly reworked batch (i.e., the original knowledge itself shouldn’t be used for training).

All that stated, we truly can take the
ImageDataGenerator  class and use it for dataset era/enlargement as properly — we simply need to use it to generate our dataset before training.

The final technique of knowledge augmentation, combining each in-place and dataset enlargement, is never used. In these conditions, you possible have a small dataset, have to generate further examples by way of knowledge augmentation, and then have a further augmentation/preprocessing at coaching time.

We wrapped up the information by performing numerous experiments with knowledge augmentation, noting that knowledge augmentation is a type of regularization, enabling our network to generalize higher to our testing/validation set.

This declare of knowledge augmentation as regularization was verified in our experiments once we found that:

  1. Not making use of knowledge augmentation at coaching brought about overfitting
  2. While apply knowledge augmentation allowed for clean training, no overfitting, and greater accuracy/decrease loss

You need to apply knowledge augmentation in all your experiments until you have got a very good purpose not to.

To study more about knowledge augmentation, together with my greatest practices, ideas, and ideas, make sure to try my e-book, Deep Learning for Pc Vision with Python.

I hope you loved at present’s tutorial!

To obtain the source code to this publish (and receive e-mail updates when future tutorials are revealed right here on PyImageSearch), just enter your e mail handle in the type under!