Looping and Convergence

A key component of most artificial intelligence and machine learning is looping, i.e. the system improving over many iterations of training. A very simple method to train in this way is just to perform updates in a for loop. We saw an example of this way back in lesson 2:

import tensorflow as tf


x = tf.Variable(0, name='x')

model = tf.initialize_all_variables()

with tf.Session() as session:
    for i in range(5):
        session.run(model)
        x = x + 1
        print(session.run(x))

We can alter this workflow to instead use a variable as the convergence loop, such as in the following:

import tensorflow as tf

x = tf.Variable(0., name='x')
threshold = tf.constant(5.)

model = tf.initialize_all_variables()

with tf.Session() as session:
    session.run(model)
    while session.run(tf.less(x, threshold)):
        x = x + 1
        x_value = session.run(x)
        print(x_value)

The major change here is that the loop is now a while loop, continuing to loop while the test (using tf.less for a less-than-test) is true. Here, we test if x is less than a given threshold (stored in a constant), and if so, we continue looping.

Gradient Descent

Any machine learning library must have a gradient descent algorithm. I think it is a law. Regardless, Tensorflow has a few variations on the theme, and they are quite straight forward to use.

Gradient Descent is a learning algorithm that attempts to minimise some error. Which error do you ask? Well that is up to us, although there are a few commonly used methods.

Let’s start with a basic example:

import tensorflow as tf
import numpy as np

# x and y are placeholders for our training data
x = tf.placeholder("float")
y = tf.placeholder("float")
# w is the variable storing our values. It is initialised with starting "guesses"
# w[0] is the "a" in our equation, w[1] is the "b"
w = tf.Variable([1.0, 2.0], name="w")
# Our model of y = a*x + b
y_model = tf.mul(x, w[0]) + w[1]

# Our error is defined as the square of the differences
error = tf.square(y - y_model)
# The Gradient Descent Optimizer does the heavy lifting
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)

# Normal TensorFlow - initialize values, create a session and run the model
model = tf.initialize_all_variables()

with tf.Session() as session:
    session.run(model)
    for i in range(1000):
        x_value = np.random.rand()
        y_value = x_value * 2 + 6
        session.run(train_op, feed_dict={x: x_value, y: y_value})

    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

The major line of interest here is train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error) where the training step is defined. It aims to minimise the value of the error Variable, which is defined earlier as the square of the differences (a common error function). The 0.01 is the step it takes to try learn a better value.

An important note here is that we are optimising just a single value, but that value can be an array. This is why we used w as the Variable, and not two separate Variables a and b.

Other Optimisation

TensorFlow has a whole set of types of optimisation, and has the ability for your to define your own as well (if you are into that sort of thing). For the API of our how to use them, see this page. The listed ones are:

  • GradientDescentOptimizer
  • AdagradOptimizer
  • MomentumOptimizer
  • AdamOptimizer
  • FtrlOptimizer
  • RMSPropOptimizer

Other optimisation methods are likely to appear in future releases of TensorFlow, or in third-party code. That said, the above optimisations are going to be sufficient for most deep learning techniques. If you aren’t sure which one to use, use GradientDescentOptimizer unless that is failing.

Plotting the error

We can plot the errors after each iteration to get the following output:

The code for this is a small change to the above. First, we create a list to store the errors in. Then, inside the loop, we explicitly compute both the train_op and the error. We do this in a single line, so that the error is computed only once. If we did this is separate lines, it would compute the error, and then the training step, and in doing that it would need to recompute the error.

Below I’ve put the code just for below the initialize_all_varaibles() line from the previous program - everything above this line is the same.

errors = []
with tf.Session() as session:
    session.run(model)
    for i in range(1000):
        x_train = tf.random_normal((1,), mean=5, stddev=2.0)
        y_train = x_train * 2 + 6
        x_value, y_value = session.run([x_train, y_train])
        _, error_value = session.run([train_op, error], feed_dict={x: x_value, y: y_value})
        errors.append(error_value)
    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

import matplotlib.pyplot as plt
plt.plot([np.mean(errors[i-50:i]) for i in range(len(errors))])
plt.show()
plt.savefig("errors.png")

You may have noticed that I take a windowed average here – using np.mean(errors[i-50:i]) instead of just using errors[i]. The reason for this is that we are only doing a single test inside the loop, so while the error tends to decrease, it bounces around quite a bit. Taking this windowed average smooths this out a bit, but as you can see above, it still jumps around.

Exercises

1) Create a convergence function for the k-means example from Lesson 6, which stops the training if the distance between the old centroids and the new centroids is less than a given epsilon value.

2) Try separate the a and b values from the Gradient Descent example (where w is used).

3) Our example trains on just a single example at a time, which is inefficient. Extend it to learn using a number (say, 50) of training samples at a time.

Stuck?

If you need some extra guidance, and want to support the site, we have created a package with answers to all exercises. In addition, it contains some extra pointers on exercises and new features not included in these lessons.

It's just $7, and you can get it here:

Keep going!

Support the site

You can also support LearningTensorFlow.com by becoming a patron at Patreon. If we have saved you trawling through heavy documentation, or given you a pointer on where to go next, help us to create new lessons and keep the site running.

We have an increasing set of lessons that we hope guides you through learning this powerful library. Follow these links to keep going to our next lesson.

You can also use the nav menu at the top of the page to go directly to a specific lesson.

Get updates

Sign up here to receive infrequent emails from us about updates to the site and when new lessons are released.



* indicates required

Does your business need a new logo?

Are you looking to create a logo? Or is it time for a logo make over? Recently dataPipeline went under a logo transformation, (the result is on the right).

We used 99designs, 99designs are a company that provide you with a global community of professional designers, to create your logo.

You get a bunch of designs from a lot of designers. You then provide feed back and select your favourite ones. After 7 days you choose your winning design!

99Designs: Get a design you’ll love — guaranteed