In this tutorial, you will study learning rate schedules and decay using Keras. You’ll discover ways to use Keras’ commonplace learning rate decay along with step-based, linear, and polynomial learning rate schedules.

When coaching a neural community, the learning rate is usually crucial hyperparameter for you to tune:

- Too small a learning rate and your neural network might not study in any respect
- Too giant a learning rate and chances are you’ll overshoot areas of low loss (and even overfit from the start of coaching)

On the subject of coaching a neural network, probably the most bang in your buck (when it comes to accuracy) is going to return from choosing the right learning rate and applicable learning rate schedule.

But that’s simpler stated than completed.

To help deep learning practitioners reminiscent of yourself discover ways to assess a problem and select an applicable learning rate, we’ll be beginning a collection of tutorials on learning rate schedules, decay, and hyperparameter tuning with Keras.

By the top of this collection, you’ll have a superb understanding of tips on how to appropriately and successfully apply learning rate schedules with Keras to your personal deep learning tasks.

To discover ways to use Keras for learning rate schedules and decay, just hold studying

Contents

- 1 Keras learning rate schedules and decay
- 1.1 Why modify our learning rate and use learning rate schedules?
- 1.2 Venture structure
- 1.3 The usual “decay” schedule in Keras
- 1.4 Our LearningRateDecay class
- 1.5 Step-based learning rate schedules with Keras
- 1.6 Linear and polynomial learning rate schedules in Keras
- 1.7 Implementing our coaching script
- 1.8 Keras learning rate schedule outcomes
- 1.9 Commentary on learning rate schedule experiments
- 1.10 Do different learning rate schedules exist?
- 1.11 How do I select my initial learning rate?
- 1.12 The place can I study extra?

- 2 Abstract
- 3 Downloads:

## Keras learning rate schedules and decay

Within the first a part of this information, we’ll talk about why the learning rate is an important hyperparameter on the subject of coaching your personal deep neural networks.

We’ll then dive into why we might need to modify our learning rate throughout training.

From there I’ll present you learn how to implement and utilize numerous learning rate schedules with Keras, together with:

- The decay schedule constructed into most Keras optimizers
- Step-based learning rate schedules
- Linear learning rate decay
- Polynomial learning rate schedules

We’ll then carry out quite a few experiments on the CIFAR-10 utilizing these learning rate schedules and evaluate which one performed the perfect.

These sets of experiments will serve as a template you should use when exploring your personal deep learning tasks and choosing an applicable learning rate and learning rate schedule.

### Why modify our learning rate and use learning rate schedules?

To see why learning rate schedules are a worthwhile technique to use to help improve model accuracy and descend into areas of decrease loss, contemplate the standard weight update formulation used by almost all neural networks:

Recall that the learning rate, , controls the “step” we make alongside the gradient. Bigger values of suggest that we’re taking greater steps. Whereas smaller values of will make tiny steps. If is zero the community can’t make any steps in any respect (because the gradient multiplied by zero is zero).

Most preliminary learning charges (however not all) you encounter are sometimes within the set .

A network is then educated for a hard and fast variety of epochs without altering the learning rate.

This technique may fit nicely in some situations, nevertheless it’s typically useful to lower our learning rate over time. When coaching our network, we’re trying to find some location alongside our loss panorama the place the network obtains affordable accuracy. It doesn’t should be a worldwide minima or perhaps a native minima, however in apply, merely discovering an area of the loss panorama with fairly low loss is “good enough”.

If we continuously hold a learning rate excessive, we might overshoot these areas of low loss as we’ll be taking too giant of steps to descend into those collection.

As an alternative, what we will do is decrease our learning rate, thereby permitting our network to take smaller steps — this decreased learning rate allows our network to descend into areas of the loss panorama which are “more optimal” and would have otherwise been missed completely by our learning rate learning.

We will, subsequently, view the process of learning rate scheduling as:

- Finding a set of fairly “good” weights early within the training process with a bigger learning rate.
- Tuning these weights later in the process to seek out extra optimal weights using a smaller learning rate.

We’ll be masking a number of the most popular learning rate schedules in this tutorial.

### Venture structure

When you’ve grabbed and extracted the “Downloads” go forward and use the

tree command to examine the venture folder:

$ tree

.

├── output

│ ├── lr_linear_schedule.png

│ ├── lr_poly_schedule.png

│ ├── lr_step_schedule.png

│ ├── train_linear_schedule.png

│ ├── train_no_schedule.png

│ ├── train_poly_schedule.png

│ ├── train_standard_schedule.png

│ └── train_step_schedule.png

├── pyimagesearch

│ ├── __init__.py

│ ├── learning_rate_schedulers.py

│ └── resnet.py

└── practice.py

2 directories, 12 information

1

2

three

four

5

6

7

8

9

10

11

12

13

14

15

16

17

18

$ tree

.

├── output

│ ├── lr_linear_schedule.png

│ ├── lr_poly_schedule.png

│ ├── lr_step_schedule.png

│ ├── train_linear_schedule.png

│ ├── train_no_schedule.png

│ ├── train_poly_schedule.png

│ ├── train_standard_schedule.png

│ └── train_step_schedule.png

├── pyimagesearch

│ ├── __init__.py

│ ├── learning_rate_schedulers.py

│ └── resnet.py

└── practice.py

2 directories, 12 information

Our

output/ listing will include learning rate and training history plots. The five experiments included in the results section correspond to the five plots with the

train_*.png filenames, respectively.

The

pyimagesearch module incorporates our ResNet CNN and our

learning_rate_schedulers.py . The

LearningRateDecay dad or mum class merely includes a technique referred to as

plot for plotting each of our kinds of learning rate decay. Additionally included are subclasses,

StepDecay and

PolynomialDecay which calculate the learning rate upon the completion of every epoch. Both of these courses include the

plot technique by way of inheritance (an object-oriented idea).

Our training script,

practice.py , will practice ResNet on the CIFAR-10 dataset. We’ll run the script with the absence of learning rate decay as well as commonplace, linear, step-based, and polynomial learning rate decay.

### The usual “decay” schedule in Keras

The Keras library ships with a time-based learning rate scheduler — it is controlled by way of the

decay parameter of the optimizer class (comparable to

SGD,

Adam, and so forth.).

To discover how we will make the most of this sort of learning rate decay, let’s check out an instance of how we might initialize the ResNet architecture and the SGD optimizer:

# initialize our optimizer and model, then compile it

choose = SGD(lr=1e-2, momentum=zero.9, decay=1e-2/epochs)

model = ResNet.build(32, 32, 3, 10, (9, 9, 9),

(64, 64, 128, 256), reg=zero.0005)

mannequin.compile(loss=”categorical_crossentropy”, optimizer=choose,

metrics=[“accuracy”])

# initialize our optimizer and mannequin, then compile it

choose = SGD(lr=1e-2, momentum=0.9, decay=1e-2/epochs)

mannequin = ResNet.build(32, 32, 3, 10, (9, 9, 9),

(64, 64, 128, 256), reg=0.zero005)

mannequin.compile(loss=”categorical_crossentropy”, optimizer=choose,

metrics=[“accuracy”])

Here we initialize our SGD optimizer with an preliminary learning rate of

1e-2 . We then set our

decay to be the learning rate divided by the whole number of epochs we’re coaching the network for (a standard rule of thumb).

Internally, Keras applies the next learning rate schedule to adjust the learning rate after every batch replace — it is a false impression that Keras updates the standard decay after every epoch. Maintain this in thoughts when utilizing the default learning rate scheduler provided with Keras.

The replace components follows:

Utilizing the CIFAR-10 dataset for instance, we’ve got a complete of 50,000 training photographs.

If we use a batch measurement of

64 , that suggests there are a complete of steps per epoch. Subsequently, a complete of

782 weight updates have to be applied before an epoch completes.

To see an example of the learning rate schedule calculation, let’s assume our initial learning rate is and our (with the idea that we are training for forty epochs).

The learning rate at step zero, before any learning rate schedule has been applied, is:

Firstly of epoch one we will see the following learning rate:

Figure 1 under continues the calculation of Keras’ normal learning rate decay and a decay of :

You’ll discover ways to utilize any such learning rate decay inside the “Implementing our training script” and “Keras learning rate schedule results” sections of this publish, respectively.

### Our LearningRateDecay class

In the the rest of this tutorial, we’ll be implementing our personal customized learning rate schedules and then incorporating them with Keras when coaching our neural networks.

To keep our code neat and tidy, and to not mention, comply with object-oriented programming greatest practices, let’s first outline a base

LearningRateDecay class that we’ll subclass for every respective learning rate schedule.

Open up the

learning_rate_schedulers.py in your directory structure and insert the following code:

# import the required packages

import matplotlib.pyplot as plt

import numpy as np

class LearningRateDecay:

def plot(self, epochs, title=”Learning Rate Schedule”):

# compute the set of learning rates for each corresponding

# epoch

lrs = [self(i) for i in epochs]

# the learning rate schedule

plt.type.use(“ggplot”)

plt.determine()

plt.plot(epochs, lrs)

plt.title(title)

plt.xlabel(“Epoch #”)

plt.ylabel(“Learning Rate”)

1

2

three

four

5

6

7

8

9

10

11

12

13

14

15

16

17

# import the required packages

import matplotlib.pyplot as plt

import numpy as np

class LearningRateDecay:

def plot(self, epochs, title=”Learning Rate Schedule”):

# compute the set of learning charges for each corresponding

# epoch

lrs = [self(i) for i in epochs]

# the learning rate schedule

plt.fashion.use(“ggplot”)

plt.figure()

plt.plot(epochs, lrs)

plt.title(title)

plt.xlabel(“Epoch #”)

plt.ylabel(“Learning Rate”)

Every and each learning rate schedule we implement could have a plot perform, enabling us to visualise our learning rate over time.

With our base

LearningRateSchedule class implement, let’s move on to making a step-based learning rate schedule.

### Step-based learning rate schedules with Keras

One widespread learning rate scheduler is step-based decay where we systematically drop the learning rate after specific epochs throughout coaching.

The step decay learning rate scheduler may be seen as a piecewise perform, as visualized in Determine 2 — right here the learning rate is fixed for a lot of epochs, then drops, is constant as soon as extra, then drops again, and so forth.

When making use of step decay to our learning rate, we have now two choices:

- Define an equation that models the piecewise drop-in learning rate that we wish to obtain.
- Use what I name the

ctrl + c technique to train a deep neural community. Right here we practice for some number of epochs at a given learning rate and ultimately discover validation performance stagnating/stalling, then

ctrl + c to cease the script, regulate our learning rate, and continue training.

We’ll primarily be focusing on the equation-based piecewise drop to learning rate scheduling in this publish.

The

ctrl + c technique is a bit more advanced and usually applied to larger datasets using deeper neural networks the place the exact number of epochs required to acquire an inexpensive model is unknown.

When you’d wish to study more concerning the

ctrl + c technique to training, please seek advice from Deep Learning for Pc Imaginative and prescient with Python.

When applying step decay, we frequently drop our learning rate by both (1) half or (2) an order of magnitude after every fastened variety of epochs. For instance, let’s suppose our preliminary learning rate is .

After 10 epochs we drop the learning rate to .

After one other 10 epochs (i.e., the 20th complete epoch), is dropped by an element of

zero.5 again, such that , and so on.

The truth is, this is the very same learning rate schedule that is depicted in Determine 2 (purple line).

The blue line shows a extra aggressive drop issue of

zero.25 . Modeled mathematically, we will outline our step-based decay equation as:

The place is the initial learning rate, is the factor worth controlling the rate by which the learning date drops, D is the “Drop every” epochs worth, and E is the current epoch.

The larger our issue is, the slower the learning rate will decay.

Conversely, the smaller the factor , the quicker the learning rate will decay.

All that stated, let’s go forward and implement our

StepDecay class now.

Go back to your

learning_rate_schedulers.py file and insert the following code:

class StepDecay(LearningRateDecay):

def __init__(self, initAlpha=0.01, issue=zero.25, dropEvery=10):

# retailer the base preliminary learning rate, drop factor, and

# epochs to drop each

self.initAlpha = initAlpha

self.factor = factor

self.dropEvery = dropEvery

def __call__(self, epoch):

# compute the learning rate for the current epoch

exp = np.flooring((1 + epoch) / self.dropEvery)

alpha = self.initAlpha * (self.issue ** exp)

# return the learning rate

return float(alpha)

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

class StepDecay(LearningRateDecay):

def __init__(self, initAlpha=zero.01, factor=zero.25, dropEvery=10):

# retailer the bottom initial learning rate, drop issue, and

# epochs to drop each

self.initAlpha = initAlpha

self.factor = factor

self.dropEvery = dropEvery

def __call__(self, epoch):

# compute the learning rate for the present epoch

exp = np.flooring((1 + epoch) / self.dropEvery)

alpha = self.initAlpha * (self.factor ** exp)

# return the learning rate

return float(alpha)

Line 20 defines the constructor to our

StepDecay class. We then store the initial learning rate (

initAlpha ), drop factor, and

dropEvery epochs values (Strains 23-25).

The

__call__ perform:

- Accepts the current

epoch number. - Computes the learning rate based mostly on the step-based decay formulation detailed above (Strains 29 and 30).
- Returns the computed learning rate for the current epoch (Line 33).

You’ll see easy methods to use this learning rate schedule later on this publish.

### Linear and polynomial learning rate schedules in Keras

Two of my favorite learning rate schedules are linear learning rate decay and polynomial learning rate decay.

Utilizing these methods our learning rate is decayed to zero over a hard and fast number of epochs.

The rate through which the learning rate is decayed is predicated on the parameters to the polynomial perform. A smaller exponent/energy to the polynomial will cause the learning rate to decay “more slowly”, whereas larger exponents decay the learning rate “more quickly”.

Conveniently, both of these strategies could be carried out in a single class:

class PolynomialDecay(LearningRateDecay):

def __init__(self, maxEpochs=100, initAlpha=0.01, energy=1.zero):

# retailer the maximum number of epochs, base learning rate,

# and energy of the polynomial

self.maxEpochs = maxEpochs

self.initAlpha = initAlpha

self.power = power

def __call__(self, epoch):

# compute the new learning rate based mostly on polynomial decay

decay = (1 – (epoch / float(self.maxEpochs))) ** self.power

alpha = self.initAlpha * decay

# return the new learning rate

return float(alpha)

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

class PolynomialDecay(LearningRateDecay):

def __init__(self, maxEpochs=100, initAlpha=0.01, energy=1.0):

# store the maximum variety of epochs, base learning rate,

# and energy of the polynomial

self.maxEpochs = maxEpochs

self.initAlpha = initAlpha

self.power = power

def __call__(self, epoch):

# compute the brand new learning rate based mostly on polynomial decay

decay = (1 – (epoch / float(self.maxEpochs))) ** self.power

alpha = self.initAlpha * decay

# return the new learning rate

return float(alpha)

Line 36 defines the constructor to our

PolynomialDecay class which requires three values:

maxEpochs : The entire number of epochs we’ll be coaching for.

initAlpha : The preliminary learning rate.

power : The facility/exponent of the polynomial.

Notice that in the event you set

power=1.zero then you will have a linear learning rate decay.

Strains 45 and 46 compute the adjusted learning rate for the current epoch while Line 49 returns the new learning rate.

### Implementing our coaching script

Now that we’ve carried out a couple of totally different Keras learning rate schedules, let’s see how we will use them inside an precise training script.

Create a file named

practice.py file in your editor and insert the following code:

# set the matplotlib backend so figures may be saved in the background

import matplotlib

matplotlib.use(“Agg”)

# import the required packages

from pyimagesearch.learning_rate_schedulers import StepDecay

from pyimagesearch.learning_rate_schedulers import PolynomialDecay

from pyimagesearch.resnet import ResNet

from sklearn.preprocessing import LabelBinarizer

from sklearn.metrics import classification_report

from keras.callbacks import LearningRateScheduler

from keras.optimizers import SGD

from keras.datasets import cifar10

import matplotlib.pyplot as plt

import numpy as np

import argparse

# set the matplotlib backend so figures could be saved within the background

import matplotlib

matplotlib.use(“Agg”)

# import the required packages

from pyimagesearch.learning_rate_schedulers import StepDecay

from pyimagesearch.learning_rate_schedulers import PolynomialDecay

from pyimagesearch.resnet import ResNet

from sklearn.preprocessing import LabelBinarizer

from sklearn.metrics import classification_report

from keras.callbacks import LearningRateScheduler

from keras.optimizers import SGD

from keras.datasets import cifar10

import matplotlib.pyplot as plt

import numpy as np

import argparse

Strains 2-16 import required packages. Line three units the

matplotlib backend in order that we will create plots as picture information. Our most notable imports embrace:

StepDecay : Our class which calculates and plots step-based learning rate decay.

PolynomialDecay : The category we wrote to calculate polynomial-based learning rate decay.

ResNet : Our Convolutional Neural Community carried out in Keras.

LearningRateScheduler : A Keras callback. We’ll move our learning rate

schedule to this class which will probably be referred to as as a callback at the completion of every epoch to calculate our learning rate.

Let’s transfer on and parse our command line arguments:

# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()

ap.add_argument(“-s”, “–schedule”, sort=str, default=””,

assist=”learning rate schedule technique”)

ap.add_argument(“-e”, “–epochs”, sort=int, default=100,

help=”# of epochs to train for”)

ap.add_argument(“-l”, “–lr-plot”, sort=str, default=”lr.png”,

assist=”path to output learning rate plot”)

ap.add_argument(“-t”, “–train-plot”, sort=str, default=”coaching.png”,

help=”path to output training plot”)

args = vars(ap.parse_args())

# assemble the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument(“-s”, “–schedule”, sort=str, default=””, assist=”learning rate schedule technique”) ap.add_argument(“-e”, “–epochs”, sort=int, default=100, help=”# of epochs to coach for”) ap.add_argument(“-l”, “–lr-plot”, sort=str, default=”lr.png”, help=”path to output learning rate plot”) ap.add_argument(“-t”, “–train-plot”, sort=str, default=”training.png”, assist=”path to output training plot”) args = vars(ap.parse_args()) |

Our script accepts any of 4 command line arguments when the script known as by way of the terminal:

–schedule : The learning rate schedule technique. Legitimate choices are “standard”, “step”, “linear”, “poly”. By default, no learning rate schedule shall be used.

–epochs : The variety of epochs to train for (

default=100 ).–lr-plot : The path to the output plot. I recommend overriding the

default of

lr.png with a more descriptive path + filename.–train-plot : The trail to the output accuracy/loss coaching historical past plot. Once more, I recommend a descriptive path + filename, otherwise

training.png shall be set by

default .

With our imports and command line arguments in hand, now it’s time to initialize our learning rate schedule:

# retailer the number of epochs to train for in a convenience variable,

# then initialize the record of callbacks and learning rate scheduler

# for use

epochs = args[“epochs”]
callbacks = []
schedule = None

# examine to see if step-based learning rate decay must be used

if args[“schedule”] == “step”:

print(“[INFO] using ‘step-based’ learning rate decay…”)

schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=15)

# examine to see if linear learning rate decay should ought to be used

elif args[“schedule”] == “linear”:

print(“[INFO] using ‘linear’ learning rate decay…”)

schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=1)

# verify to see if a polynomial learning rate decay ought to be used

elif args[“schedule”] == “poly”:

print(“[INFO] using ‘polynomial’ learning rate decay…”)

schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, energy=5)

# if the learning rate schedule just isn’t empty, add it to the listing of

# callbacks

if schedule is just not None:

callbacks = [LearningRateScheduler(schedule)]

30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | # retailer the variety of epochs to coach for in a convenience variable, # then initialize the listing of callbacks and learning rate scheduler # to be used epochs = args[“epochs”] callbacks = [] schedule = None # examine to see if step-based learning rate decay ought to be used if args[“schedule”] == “step”: print(“[INFO] using ‘step-based’ learning rate decay…”) schedule = StepDecay(initAlpha=1e-1, issue=0.25, dropEvery=15) # examine to see if linear learning rate decay should must be used elif args[“schedule”] == “linear”: print(“[INFO] using ‘linear’ learning rate decay…”) schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, energy=1) # examine to see if a polynomial learning rate decay ought to be used elif args[“schedule”] == “poly”: print(“[INFO] using ‘polynomial’ learning rate decay…”) schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, energy=5) # if the learning rate schedule just isn’t empty, add it to the record of # callbacks if schedule isn’t None: callbacks = [LearningRateScheduler(schedule)] |

Line 33 sets the variety of

epochs we’ll practice for instantly from the command line

args variable. From there we’ll initialize our

callbacks listing and learning rate

schedule (Strains 34 and 35).

Strains 38-50 then choose the learning rate

schedule if

args[“schedule”] incorporates a legitimate value:

“step” : Initializes

StepDecay .“linear” : Initializes

PolynomialDecay with

power=1 indicating that a linear learning rate decay might be utilized.“poly” :

PolynomialDecay with a

power=5 will probably be used.

After you’ve reproduced the outcomes of the experiments on this tutorial, make sure you revisit Strains 38-50 and insert further

elif statements of your personal so you possibly can run some of your personal experiments!

Strains 54 and 55 initialize the

LearningRateScheduler with the schedule as a single callback part of the

callbacks listing. There’s a case the place no learning rate decay will probably be used (i.e. if the

–schedule command line argument isn’t overridden when the script is executed).

Let’s go ahead and load our knowledge:

# load the training and testing knowledge, then scale it into the

# vary [0, 1]
print(“[INFO] loading CIFAR-10 data…”)

((trainX, trainY), (testX, testY)) = cifar10.load_data()

trainX = trainX.astype(“float”) / 255.0

testX = testX.astype(“float”) / 255.0

# convert the labels from integers to vectors

lb = LabelBinarizer()

trainY = lb.fit_transform(trainY)

testY = lb.rework(testY)

# initialize the label names for the CIFAR-10 dataset

labelNames = [“airplane”, “automobile”, “bird”, “cat”, “deer”,

“dog”, “frog”, “horse”, “ship”, “truck”]

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

# load the training and testing knowledge, then scale it into the

# vary [0, 1]

print(“[INFO] loading CIFAR-10 data…”)

((trainX, trainY), (testX, testY)) = cifar10.load_data()

trainX = trainX.astype(“float”) / 255.0

testX = testX.astype(“float”) / 255.zero

# convert the labels from integers to vectors

lb = LabelBinarizer()

trainY = lb.fit_transform(trainY)

testY = lb.rework(testY)

# initialize the label names for the CIFAR-10 dataset

labelNames =[“airplane””vehicle””hen””cat””deer”[“airplane””vehicle””hen””cat””deer”[“airplane””vehicle””chook””cat””deer”[“airplane””vehicle””hen””cat””deer”

“dog”, “frog”, “horse”, “ship”, “truck”]

Line 60 masses our CIFAR-10 knowledge. The dataset is conveniently already cut up into coaching and testing sets.

The one preprocessing we must perform is to scale the info into the vary [0, 1] (Strains 61 and 62).

Strains 65-67 binarize the labels and then Strains 70 and 71 initialize our

labelNames (i.e. courses). Do not add to or alter the

labelNames record as order and size of the record matter.

Let’s initialize

decay parameter:

# initialize the decay for the optimizer

decay = zero.zero

# if we are utilizing Keras’ “standard” decay, then we need to set the

# decay parameter

if args[“schedule”] == “standard”:

print(“[INFO] using ‘keras standard’ learning rate decay…”)

decay = 1e-1 / epochs

# in any other case, no learning rate schedule is being used

elif schedule is None:

print(“[INFO] no learning rate schedule being used”)

# initialize the decay for the optimizer

decay = 0.zero

# if we are utilizing Keras’ “standard” decay, then we need to set the

# decay parameter

if args[“schedule”] == “standard”:

print(“[INFO] using ‘keras standard’ learning rate decay…”)

decay = 1e-1 / epochs

# otherwise, no learning rate schedule is getting used

elif schedule is None:

print(“[INFO] no learning rate schedule being used”)

Line 74 initializes our learning rate

decay .

If we’re utilizing the

“standard” learning rate decay schedule, then the decay is initialized as

1e-1 / epochs (Strains 78-80).

With all of our initializations taken care of, let’s go ahead and compile + practice our

ResNet model:

# initialize our optimizer and mannequin, then compile it

choose = SGD(lr=1e-1, momentum=zero.9, decay=decay)

model = ResNet.construct(32, 32, 3, 10, (9, 9, 9),

(64, 64, 128, 256), reg=0.0005)

mannequin.compile(loss=”categorical_crossentropy”, optimizer=choose,

metrics=[“accuracy”])

# practice the community

H = mannequin.match(trainX, trainY, validation_data=(testX, testY),

batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)

# initialize our optimizer and model, then compile it

choose = SGD(lr=1e-1, momentum=zero.9, decay=decay)

model = ResNet.build(32, 32, 3, 10, (9, 9, 9),

(64, 64, 128, 256), reg=0.zero005)

mannequin.compile(loss=”categorical_crossentropy”, optimizer=choose,

metrics=[“accuracy”])

# practice the network

H = mannequin.match(trainX, trainY, validation_data=(testX, testY),

batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)

Our Stochastic Gradient Descent (

SGD ) optimizer is initialized on Line 87 utilizing our

decay .

From there, Strains 88 and 89 build our

ResNet CNN with an input form of 32x32x3 and 10 courses. For an in-depth evaluation of ResNet, ensure confer with Chapter 10: ResNet of Deep Learning for Pc Vision with Python.

Our

mannequin is compiled with a

loss perform of

“categorical_crossentropy” since our dataset has > 2 courses. In case you use a unique dataset with only 2 courses, remember to use

loss=”binary_crossentropy” .

Strains 94 and 95 kick of our coaching course of. Discover that we’ve offered the

callbacks as a parameter. The

callbacks will probably be referred to as when each epoch is accomplished. Our

LearningRateScheduler contained therein will deal with our learning rate decay (as long as

callbacks isn’t an empty listing).

Finally, let’s consider our network and generate plots:

# evaluate the community

print(“[INFO] evaluating network…”)

predictions = mannequin.predict(testX, batch_size=128)

print(classification_report(testY.argmax(axis=1),

predictions.argmax(axis=1), target_names=labelNames))

# plot the coaching loss and accuracy

N = np.arange(zero, args[“epochs”])

plt.fashion.use(“ggplot”)

plt.determine()

plt.plot(N, H.historical past[“loss”], label=”train_loss”)

plt.plot(N, H.historical past[“val_loss”], label=”val_loss”)

plt.plot(N, H.history[“acc”], label=”train_acc”)

plt.plot(N, H.historical past[“val_acc”], label=”val_acc”)

plt.title(“Training Loss and Accuracy on CIFAR-10”)

plt.xlabel(“Epoch #”)

plt.ylabel(“Loss/Accuracy”)

plt.legend()

plt.savefig(args[“train_plot”])

# if the learning rate schedule just isn’t empty, then save the learning

# rate plot

if schedule isn’t None:

schedule.plot(N)

plt.savefig(args[“lr_plot”])

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | # consider the community print(“[INFO] evaluating network…”) predictions = mannequin.predict(testX, batch_size=128) print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labelNames)) # plot the coaching loss and accuracy N = np.arange(zero, args[“epochs”]) plt.fashion.use(“ggplot”) plt.determine() plt.plot(N, H.historical past[“loss”], label=”train_loss”) plt.plot(N, H.historical past[“val_loss”], label=”val_loss”) plt.plot(N, H.history[“acc”], label=”train_acc”) plt.plot(N, H.history[“val_acc”], label=”val_acc”) plt.title(“Training Loss and Accuracy on CIFAR-10”) plt.xlabel(“Epoch #”) plt.ylabel(“Loss/Accuracy”) plt.legend() plt.savefig(args[“train_plot”]) # if the learning rate schedule shouldn’t be empty, then save the learning # rate plot if schedule shouldn’t be None: schedule.plot(N) plt.savefig(args[“lr_plot”]) |

Strains 99-101 evaluate our community and print a classification report back to our terminal.

Strains 104-115 generate and save our training historical past plot (accuracy/loss curves). Strains 119-121 generate a learning rate schedule plot, if applicable. We’ll examine these plot visualizations in the next part.

### Keras learning rate schedule outcomes

With each our (1) learning rate schedules and (2) training scripts carried out, let’s run some experiments to see which learning rate schedule will carry out greatest given:

- An preliminary learning rate of

1e-1 - Coaching for a complete of

100 epochs

#### Experiment #1: No learning rate decay/schedule

As a baseline, let’s first practice our ResNet mannequin on CIFAR-10 with no learning rate decay or schedule:

$ python practice.py –train-plot output/train_no_schedule.png

[INFO] loading CIFAR-10 knowledge…

[INFO] no learning rate being used

Practice on 50000 samples, validate on 10000 samples

Epoch 1/100

50000/50000 [==============================] – 186s 4ms/step – loss: 2.1204 – acc: zero.4372 – val_loss: 1.9361 – val_acc: zero.5118

Epoch 2/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.5150 – acc: 0.6440 – val_loss: 1.5013 – val_acc: zero.6413

Epoch 3/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.2186 – acc: zero.7369 – val_loss: 1.2288 – val_acc: zero.7315

…

Epoch 98/100

50000/50000 [==============================] – 171s 3ms/step – loss: 0.5220 – acc: zero.9568 – val_loss: 1.0223 – val_acc: 0.8372

Epoch 99/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.5349 – acc: 0.9532 – val_loss: 1.0423 – val_acc: 0.8230

Epoch 100/100

50000/50000 [==============================] – 171s 3ms/step – loss: 0.5209 – acc: 0.9579 – val_loss: zero.9883 – val_acc: 0.8421

[INFO] evaluating community…

precision recall f1-score help

airplane zero.84 zero.86 0.85 1000

vehicle zero.90 zero.93 0.92 1000

fowl zero.83 0.74 zero.78 1000

cat 0.67 zero.79 0.73 1000

deer zero.78 0.88 0.83 1000

canine zero.85 zero.69 zero.76 1000

frog 0.85 0.89 0.87 1000

horse zero.94 zero.82 0.88 1000

ship zero.91 0.90 zero.90 1000

truck zero.90 0.90 zero.90 1000

micro avg 0.84 zero.84 0.84 10000

macro avg 0.85 zero.84 0.84 10000

weighted avg 0.85 zero.84 zero.84 10000

1 2 3 four 5 6 7 eight 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | $ python practice.py –train-plot output/train_no_schedule.png [INFO] loading CIFAR-10 knowledge… [INFO] no learning rate getting usedPractice on 50000 samples, validate on 10000 samples Epoch 1/100 50000/50000 [==============================] – 186s 4ms/step – loss: 2.1204 – acc: zero.4372 – val_loss: 1.9361 – val_acc: zero.5118 Epoch 2/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.5150 – acc: 0.6440 – val_loss: 1.5013 – val_acc: 0.6413 Epoch three/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.2186 – acc: 0.7369 – val_loss: 1.2288 – val_acc: zero.7315 … Epoch 98/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.5220 – acc: zero.9568 – val_loss: 1.0223 – val_acc: zero.8372 Epoch 99/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.5349 – acc: zero.9532 – val_loss: 1.0423 – val_acc: 0.8230 Epoch 100/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.5209 – acc: zero.9579 – val_loss: zero.9883 – val_acc: zero.8421 [INFO] evaluating network…precision recall f1-score help airplane zero.84 zero.86 zero.85 1000 vehicle zero.90 zero.93 zero.92 1000 fowl 0.83 0.74 0.78 1000 cat zero.67 zero.79 zero.73 1000 deer zero.78 zero.88 zero.83 1000 canine zero.85 zero.69 zero.76 1000 frog 0.85 0.89 0.87 1000 horse zero.94 0.82 zero.88 1000 ship 0.91 zero.90 0.90 1000 truck zero.90 zero.90 0.90 1000 micro avg 0.84 zero.84 0.84 10000 macro avg 0.85 0.84 zero.84 10000 weighted avg zero.85 zero.84 0.84 10000 |

Here we acquire ~85% accuracy, however as we will see, validation loss and accuracy stagnate previous epoch ~15 and do not enhance over the remainder of the 100 epochs.

Our aim is now to make the most of learning rate scheduling to beat our 85% accuracy (with out overfitting).

#### Experiment: #2: Keras normal optimizer learning rate decay

In our second experiment we are going to use Keras’ commonplace decay-based learning rate schedule:

$ python practice.py –schedule normal –train-plot output/train_standard_schedule.png

[INFO] loading CIFAR-10 knowledge…

[INFO] using ‘keras commonplace’ learning rate decay…

Practice on 50000 samples, validate on 10000 samples

Epoch 1/100

50000/50000 [==============================] – 184s 4ms/step – loss: 2.1074 – acc: zero.4460 – val_loss: 1.8397 – val_acc: zero.5334

Epoch 2/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.5068 – acc: zero.6516 – val_loss: 1.5099 – val_acc: zero.6663

Epoch three/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.2097 – acc: 0.7512 – val_loss: 1.2928 – val_acc: 0.7176

…

Epoch 98/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1752 – acc: 1.0000 – val_loss: zero.8892 – val_acc: 0.8209

Epoch 99/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1746 – acc: 1.0000 – val_loss: zero.8923 – val_acc: zero.8204

Epoch 100/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1740 – acc: 1.0000 – val_loss: 0.8924 – val_acc: 0.8208

[INFO] evaluating community…

precision recall f1-score help

airplane zero.81 0.86 zero.84 1000

vehicle zero.91 0.91 zero.91 1000

hen zero.75 zero.71 0.73 1000

cat 0.68 0.65 zero.66 1000

deer zero.78 0.81 zero.79 1000

canine zero.77 0.74 0.75 1000

frog 0.83 0.88 0.85 1000

horse 0.86 0.87 0.86 1000

ship 0.90 zero.90 zero.90 1000

truck zero.90 0.88 zero.89 1000

micro avg 0.82 zero.82 0.82 10000

macro avg zero.82 0.82 zero.82 10000

weighted avg 0.82 0.82 0.82 10000

1 2 3 four 5 6 7 eight 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | $ python practice.py –schedule normal –train-plot output/train_standard_schedule.png [INFO] loading CIFAR-10 knowledge… [INFO] using ‘keras commonplace’ learning rate decay…Practice on 50000 samples, validate on 10000 samples Epoch 1/100 50000/50000 [==============================] – 184s 4ms/step – loss: 2.1074 – acc: zero.4460 – val_loss: 1.8397 – val_acc: zero.5334 Epoch 2/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.5068 – acc: 0.6516 – val_loss: 1.5099 – val_acc: 0.6663 Epoch three/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.2097 – acc: zero.7512 – val_loss: 1.2928 – val_acc: zero.7176 … Epoch 98/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1752 – acc: 1.0000 – val_loss: zero.8892 – val_acc: 0.8209 Epoch 99/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1746 – acc: 1.0000 – val_loss: 0.8923 – val_acc: 0.8204 Epoch 100/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1740 – acc: 1.0000 – val_loss: zero.8924 – val_acc: 0.8208 [INFO] evaluating community…precision recall f1-score help airplane zero.81 0.86 0.84 1000 vehicle 0.91 0.91 zero.91 1000 fowl 0.75 zero.71 zero.73 1000 cat zero.68 0.65 0.66 1000 deer zero.78 zero.81 0.79 1000 dog zero.77 zero.74 zero.75 1000 frog zero.83 zero.88 zero.85 1000 horse zero.86 zero.87 0.86 1000 ship 0.90 0.90 0.90 1000 truck zero.90 0.88 0.89 1000 micro avg 0.82 zero.82 zero.82 10000 macro avg 0.82 zero.82 zero.82 10000 weighted avg 0.82 0.82 zero.82 10000 |

This time we only get hold of 82% accuracy, which matches to point out, learning rate decay/scheduling won’t all the time enhance your results! It is advisable be careful which learning rate schedule you make the most of.

#### Experiment #three: Step-based learning rate schedule results

Let’s go ahead and carry out step-based learning rate scheduling which can drop our learning rate by an element of zero.25 every 15 epochs:

$ python practice.py –schedule step –lr-plot output/lr_step_schedule.png –train-plot output/train_step_schedule.png

[INFO] utilizing ‘step-based’ learning rate decay…

[INFO] loading CIFAR-10 knowledge…

Practice on 50000 samples, validate on 10000 samples

Epoch 1/100

50000/50000 [==============================] – 186s 4ms/step – loss: 2.2839 – acc: 0.4328 – val_loss: 1.8936 – val_acc: 0.5530

Epoch 2/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.6425 – acc: zero.6213 – val_loss: 1.4599 – val_acc: 0.6749

Epoch 3/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.2971 – acc: 0.7177 – val_loss: 1.3298 – val_acc: zero.6953

…

Epoch 98/100

50000/50000 [==============================] – 171s 3ms/step – loss: 0.1817 – acc: 1.0000 – val_loss: zero.7221 – val_acc: zero.8653

Epoch 99/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1817 – acc: 1.0000 – val_loss: zero.7228 – val_acc: 0.8661

Epoch 100/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1817 – acc: 1.0000 – val_loss: 0.7267 – val_acc: 0.8652

[INFO] evaluating community…

precision recall f1-score help

airplane zero.86 0.89 0.87 1000

vehicle zero.94 zero.93 zero.94 1000

hen zero.83 0.80 0.81 1000

cat 0.75 0.73 zero.74 1000

deer 0.82 zero.87 0.84 1000

dog 0.82 0.77 0.79 1000

frog zero.89 0.90 0.90 1000

horse 0.91 0.90 zero.90 1000

ship 0.93 zero.93 0.93 1000

truck zero.90 zero.93 zero.92 1000

micro avg 0.87 0.87 0.87 10000

macro avg zero.86 zero.87 0.86 10000

weighted avg zero.86 0.87 zero.86 10000

1 2 three four 5 6 7 eight 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | $ python practice.py –schedule step –lr-plot output/lr_step_schedule.png –train-plot output/train_step_schedule.png [INFO] utilizing ‘step-based’ learning rate decay… [INFO] loading CIFAR-10 knowledge…Practice on 50000 samples, validate on 10000 samples Epoch 1/100 50000/50000 [==============================] – 186s 4ms/step – loss: 2.2839 – acc: 0.4328 – val_loss: 1.8936 – val_acc: zero.5530 Epoch 2/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.6425 – acc: zero.6213 – val_loss: 1.4599 – val_acc: zero.6749 Epoch 3/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.2971 – acc: zero.7177 – val_loss: 1.3298 – val_acc: 0.6953 … Epoch 98/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1817 – acc: 1.0000 – val_loss: 0.7221 – val_acc: zero.8653 Epoch 99/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1817 – acc: 1.0000 – val_loss: 0.7228 – val_acc: zero.8661 Epoch 100/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1817 – acc: 1.0000 – val_loss: 0.7267 – val_acc: 0.8652 [INFO] evaluating network…precision recall f1-score help airplane 0.86 zero.89 0.87 1000 vehicle 0.94 zero.93 0.94 1000 chook zero.83 0.80 zero.81 1000 cat zero.75 0.73 0.74 1000 deer zero.82 0.87 zero.84 1000 canine zero.82 0.77 zero.79 1000 frog zero.89 zero.90 0.90 1000 horse zero.91 0.90 0.90 1000 ship 0.93 zero.93 0.93 1000 truck zero.90 zero.93 0.92 1000 micro avg zero.87 0.87 zero.87 10000 macro avg zero.86 0.87 zero.86 10000 weighted avg 0.86 zero.87 0.86 10000 |

Figure 5 (left) visualizes our learning rate schedule. Discover how after each 15 epochs our learning rate drops, creating the “stair-step”-like effect.

Figure 5 (proper) demonstrates the basic signs of step-based learning rate scheduling — you possibly can clearly see our:

- Coaching/validation loss decrease
- Coaching/validation accuracy improve

…when our learning rate is dropped.

This is especially pronounced in the first two drops (epochs 15 and 30), after which the drops turn into much less substantial.

This sort of steep drop is a basic sign of a step-based learning rate schedule being utilized — should you see that sort of training conduct in a paper, publication, or one other tutorial, you could be virtually positive that they used step-based decay!

Getting back to our accuracy, we’re now at 86-87% accuracy, an improvement from our first experiment.

#### Experiment #four: Linear learning rate schedule results

Let’s attempt utilizing a linear learning rate schedule with Keras by setting

power=1.0 :

$ python practice.py –schedule linear –lr-plot output/lr_linear_schedule.png –train-plot output/train_linear_schedule.png

[INFO] utilizing ‘linear’ learning rate decay…

[INFO] loading CIFAR-10 knowledge…

Epoch 1/100

50000/50000 [==============================] – 187s 4ms/step – loss: 2.0399 – acc: zero.4541 – val_loss: 1.6900 – val_acc: zero.5789

Epoch 2/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.4623 – acc: 0.6588 – val_loss: 1.4535 – val_acc: 0.6557

Epoch three/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.1790 – acc: zero.7480 – val_loss: 1.2633 – val_acc: zero.7230

…

Epoch 98/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1025 – acc: 1.0000 – val_loss: zero.5623 – val_acc: 0.8804

Epoch 99/100

50000/50000 [==============================] – 171s 3ms/step – loss: 0.1021 – acc: 1.0000 – val_loss: zero.5636 – val_acc: zero.8800

Epoch 100/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1019 – acc: 1.0000 – val_loss: zero.5622 – val_acc: 0.8808

[INFO] evaluating network…

precision recall f1-score help

airplane 0.88 0.91 0.89 1000

vehicle 0.94 0.94 0.94 1000

fowl 0.84 0.81 zero.82 1000

cat zero.78 zero.76 zero.77 1000

deer zero.86 zero.90 zero.88 1000

canine 0.84 zero.80 0.82 1000

frog zero.90 0.92 0.91 1000

horse zero.91 zero.91 0.91 1000

ship 0.93 0.94 0.93 1000

truck zero.93 0.93 0.93 1000

micro avg zero.88 0.88 0.88 10000

macro avg zero.88 0.88 zero.88 10000

weighted avg zero.88 zero.88 0.88 10000

1 2 three 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | $ python practice.py –schedule linear –lr-plot output/lr_linear_schedule.png –train-plot output/train_linear_schedule.png [INFO] using ‘linear’ learning rate decay… [INFO] loading CIFAR-10 knowledge…Epoch 1/100 50000/50000 [==============================] – 187s 4ms/step – loss: 2.0399 – acc: 0.4541 – val_loss: 1.6900 – val_acc: zero.5789 Epoch 2/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.4623 – acc: zero.6588 – val_loss: 1.4535 – val_acc: zero.6557 Epoch 3/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.1790 – acc: zero.7480 – val_loss: 1.2633 – val_acc: zero.7230 … Epoch 98/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1025 – acc: 1.0000 – val_loss: 0.5623 – val_acc: zero.8804 Epoch 99/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1021 – acc: 1.0000 – val_loss: zero.5636 – val_acc: zero.8800 Epoch 100/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1019 – acc: 1.0000 – val_loss: zero.5622 – val_acc: 0.8808 [INFO] evaluating community…precision recall f1-score help airplane 0.88 0.91 zero.89 1000 vehicle 0.94 0.94 0.94 1000 fowl 0.84 zero.81 0.82 1000 cat 0.78 zero.76 0.77 1000 deer zero.86 0.90 0.88 1000 canine zero.84 zero.80 0.82 1000 frog 0.90 zero.92 0.91 1000 horse zero.91 zero.91 0.91 1000 ship zero.93 zero.94 0.93 1000 truck zero.93 0.93 zero.93 1000 micro avg 0.88 zero.88 zero.88 10000 macro avg 0.88 0.88 zero.88 10000 weighted avg zero.88 0.88 zero.88 10000 |

Figure 6 (left) exhibits that our learning rate is reducing linearly over time while Figure 6 (right) visualizes our training historical past.

We’re now seeing a sharper drop in each training and validation loss, particularly previous roughly epoch 75; nevertheless, observe that our coaching loss is dropping considerably quicker than our validation loss — we may be vulnerable to overfitting.

Regardless, we at the moment are acquiring 88% accuracy on our knowledge, our greatest outcome so far.

#### Experiment #5: Polynomial learning rate schedule outcomes

As a remaining experiment let’s apply polynomial learning rate scheduling with Keras by setting

power=5 :

$ python practice.py –schedule poly –lr-plot output/lr_poly_schedule.png –train-plot output/train_poly_schedule.png

[INFO] utilizing ‘polynomial’ learning rate decay…

[INFO] loading CIFAR-10 knowledge…

Epoch 1/100

50000/50000 [==============================] – 186s 4ms/step – loss: 2.0470 – acc: zero.4445 – val_loss: 1.7379 – val_acc: zero.5576

Epoch 2/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.4793 – acc: 0.6448 – val_loss: 1.4536 – val_acc: zero.6513

Epoch three/100

50000/50000 [==============================] – 171s 3ms/step – loss: 1.2080 – acc: zero.7332 – val_loss: 1.2363 – val_acc: 0.7183

…

Epoch 98/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1547 – acc: 1.0000 – val_loss: zero.6960 – val_acc: zero.8581

Epoch 99/100

50000/50000 [==============================] – 171s 3ms/step – loss: 0.1547 – acc: 1.0000 – val_loss: 0.6883 – val_acc: 0.8596

Epoch 100/100

50000/50000 [==============================] – 171s 3ms/step – loss: zero.1548 – acc: 1.0000 – val_loss: 0.6942 – val_acc: zero.8601

[INFO] evaluating network…

precision recall f1-score help

airplane 0.86 zero.89 zero.87 1000

vehicle 0.94 zero.94 zero.94 1000

hen zero.78 0.80 0.79 1000

cat zero.75 0.70 zero.73 1000

deer zero.83 0.86 zero.84 1000

dog zero.81 zero.78 zero.79 1000

frog 0.86 0.91 zero.89 1000

horse zero.92 0.88 zero.90 1000

ship zero.94 zero.92 zero.93 1000

truck zero.91 zero.92 0.91 1000

micro avg zero.86 zero.86 zero.86 10000

macro avg 0.86 zero.86 0.86 10000

weighted avg zero.86 0.86 0.86 10000

1 2 3 four 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | $ python practice.py –schedule poly –lr-plot output/lr_poly_schedule.png –train-plot output/train_poly_schedule.png [INFO] utilizing ‘polynomial’ learning rate decay… [INFO] loading CIFAR-10 knowledge…Epoch 1/100 50000/50000 [==============================] – 186s 4ms/step – loss: 2.0470 – acc: zero.4445 – val_loss: 1.7379 – val_acc: 0.5576 Epoch 2/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.4793 – acc: 0.6448 – val_loss: 1.4536 – val_acc: zero.6513 Epoch 3/100 50000/50000 [==============================] – 171s 3ms/step – loss: 1.2080 – acc: 0.7332 – val_loss: 1.2363 – val_acc: 0.7183 … Epoch 98/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1547 – acc: 1.0000 – val_loss: 0.6960 – val_acc: zero.8581 Epoch 99/100 50000/50000 [==============================] – 171s 3ms/step – loss: zero.1547 – acc: 1.0000 – val_loss: 0.6883 – val_acc: zero.8596 Epoch 100/100 50000/50000 [==============================] – 171s 3ms/step – loss: 0.1548 – acc: 1.0000 – val_loss: 0.6942 – val_acc: 0.8601 [INFO] evaluating network…precision recall f1-score help airplane zero.86 zero.89 0.87 1000 vehicle zero.94 0.94 0.94 1000 hen 0.78 zero.80 zero.79 1000 cat 0.75 zero.70 zero.73 1000 deer zero.83 0.86 zero.84 1000 dog 0.81 zero.78 zero.79 1000 frog 0.86 zero.91 zero.89 1000 horse 0.92 0.88 0.90 1000 ship 0.94 zero.92 0.93 1000 truck 0.91 0.92 zero.91 1000 micro avg 0.86 0.86 0.86 10000 macro avg 0.86 0.86 0.86 10000 weighted avg zero.86 zero.86 0.86 10000 |

Figure 7 (left) visualizes the truth that our learning rate is now decaying in line with our polynomial perform whereas Determine 7 (right) plots our coaching historical past.

This time we get hold of ~86% accuracy.

### Commentary on learning rate schedule experiments

Our best experiment was from our fourth experiment where we utilized a linear learning rate schedule.

However does that mean we should always all the time use a linear learning rate schedule?

No, removed from it, truly.

The important thing takeaway right here is that for this:

- Specific dataset (CIFAR-10)
- Specific neural community architecture (ResNet)
- Initial learning rate of 1e-2
- Number of training epochs (100)

…is that linear learning rate scheduling labored one of the best.

No two deep learning tasks are alike so you’ll need to run your personal set of experiments, together with various the preliminary learning rate and the entire number of epochs, to find out the suitable learning rate schedule (further commentary is included in the “Summary” section of this tutorial as properly).

### Do different learning rate schedules exist?

Different learning rate schedules exist, and actually, any mathematical perform that can accept an epoch or batch number as an enter and returns a learning rate may be thought-about a “learning rate schedule”. Two other learning rate schedules chances are you’ll encounter embrace (1) exponential learning rate decay, in addition to (2) cyclical learning rates.

I don’t typically use exponential decay as I discover that linear and polynomial decay are more than enough, however you’re more than welcome to subclass the

LearningRateDecay class and implement exponential decay should you so want.

Cyclical learning rates, however, are very highly effective — we’ll be masking cyclical learning rates in a tutorial later on this collection.

### How do I select my initial learning rate?

You’ll discover that on this tutorial we didn’t differ our learning rate, we stored it fixed at

1e-2 .

When performing your personal experiments you’ll need to combine:

- Learning rate schedules…
- …with totally different learning rates

Don’t be afraid to combine and match!

The 4 most necessary hyperparameters you’ll need to discover, embrace:

- Preliminary learning rate
- Number of training epochs
- Learning rate schedule
- Regularization power/quantity (L2, dropout, and so on.)

Discovering an applicable stability of every could be difficult, however by means of many experiments, you’ll have the ability to discover a recipe that leads to a extremely correct neural community.

In the event you’d wish to study extra about my ideas, strategies, and greatest practices for learning charges, learning rate schedules, and training your personal neural networks, confer with my e-book, Deep Learning for Pc Vision with Python.

### The place can I study extra?

At this time’s tutorial launched you to learning rate decay and schedulers using Keras. To study more about learning rates, schedulers, and the right way to write custom callback features, seek advice from my e-book, Deep Learning for Pc Vision with Python.

**Inside the guide I cowl:**

- More details on learning charges (and how a strong understanding of the idea impacts your deep learning success)
- Methods to spot beneath/overfitting on-the-fly with a custom training monitor callback
- Learn how to checkpoint your models with a customized callback
- My ideas/tips, recommendations, and greatest practices for training CNNs

In addition to content material on learning rates, you’ll also find:

- Super sensible walkthroughs that current options to actual, real-world image classification, object detection, and occasion segmentation issues.
- Palms-on tutorials (with a lot of code) that not solely present you the algorithms behind deep learning for pc vision but their implementations as nicely.
- A no-nonsense educating type that’s assured that will help you grasp deep learning for picture understanding and visible recognition.

To study extra concerning the guide, and seize the table of contents + free sample chapters, simply click on right here!

## Abstract

On this tutorial, you discovered easy methods to make the most of Keras for learning rate decay and learning rate scheduling.

Particularly, you discovered find out how to implement and make the most of a variety of learning rate schedules with Keras, including:

- The decay schedule built into most Keras optimizers
- Step-based learning rate schedules
- Linear learning rate decay
- Polynomial learning rate schedules

After implementing our learning rate schedules we evaluated each on a set of experiments on the CIFAR-10 dataset.

Our results demonstrated that for an preliminary learning rate of

1e-2 , the linear learning rate schedule, decaying over

100 epochs, carried out the perfect.

Nevertheless, this does not mean that a linear learning rate schedule will all the time outperform other varieties of schedules. As an alternative, all this implies is that for this:

- Specific dataset (CIFAR-10)
- Specific neural network structure (ResNet)
- Preliminary learning rate of

1e-2 - Variety of coaching epochs (

100 )

…that linear learning rate scheduling worked the most effective.

No two deep learning tasks are alike so you’ll need to run your personal set of experiments, together with various the initial learning rate, to determine the appropriate learning rate schedule.

I recommend you keep an experiment log that details any hyperparameter decisions and related outcomes, that means you possibly can refer again to it and double-down on experiments that look promising.

Don’t anticipate that you simply’ll be capable of practice a neural network and be “one and done” — that not often, if ever, occurs. As an alternative, set the expectation with your self that you simply’ll be operating many experiments and tuning hyperparameters as you go along. Machine learning, deep learning, and artificial intelligence as an entire are iterative — you build in your earlier results.

Later in this collection of tutorials I’ll also be displaying you learn how to choose your initial learning rate.

To obtain the supply code to this submit, and be notified when future tutorials are revealed here on PyImageSearch, just enter your e mail handle in the type under!