Bonus content for just $7!

Get our exercise solution bundle, with loads of extra code, solutions to all exercises and updates for life - we add a new lesson, you get more code at no cost!

Buy it now, support the site, and thanks for your support!

Training and Convergence

Recommended reading:

A key component of most artificial intelligence and machine learning is looping, i.e. the system improving over many iterations of training. A very simple method to train in this way is just to perform updates in a for loop. We saw an example of this way back in lesson 2:

import tensorflow as tf


x = tf.Variable(0, name='x')

model = tf.global_variables_initializer()

with tf.Session() as session:
    for i in range(5):
        session.run(model)
        x = x + 1
        print(session.run(x))

We can alter this workflow to instead use a variable as the convergence loop, such as in the following:

import tensorflow as tf

x = tf.Variable(0., name='x')
threshold = tf.constant(5.)

model = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(model)
    while session.run(tf.less(x, threshold)):
        x = x + 1
        x_value = session.run(x)
        print(x_value)

The major change here is that the loop is now a while loop, continuing to loop while the test (using tf.less for a less-than-test) is true. Here, we test if x is less than a given threshold (stored in a constant), and if so, we continue looping.

Gradient Descent

Any machine learning library must have a gradient descent algorithm. I think it is a law. Regardless, Tensorflow has a few variations on the theme, and they are quite straight forward to use.

Gradient Descent is a learning algorithm that attempts to minimise some error. Which error do you ask? Well that is up to us, although there are a few commonly used methods.

Let’s start with a basic example:

import tensorflow as tf
import numpy as np

# x and y are placeholders for our training data
x = tf.placeholder("float")
y = tf.placeholder("float")
# w is the variable storing our values. It is initialised with starting "guesses"
# w[0] is the "a" in our equation, w[1] is the "b"
w = tf.Variable([1.0, 2.0], name="w")
# Our model of y = a*x + b
y_model = tf.multiply(x, w[0]) + w[1]

# Our error is defined as the square of the differences
error = tf.square(y - y_model)
# The Gradient Descent Optimizer does the heavy lifting
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)

# Normal TensorFlow - initialize values, create a session and run the model
model = tf.global_variables_initializer()

with tf.Session() as session:
	session.run(model)
	for i in range(1000):
		x_value = np.random.rand()
		y_value = x_value * 2 + 6
		session.run(train_op, feed_dict={x: x_value, y: y_value})

	w_value = session.run(w)
	print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

The major line of interest here is train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error) where the training step is defined. It aims to minimise the value of the error Variable, which is defined earlier as the square of the differences (a common error function). The 0.01 is the step it takes to try learn a better value.

An important note here is that we are optimising just a single value, but that value can be an array. This is why we used w as the Variable, and not two separate Variables a and b.

Other Optimisation

TensorFlow has a whole set of types of optimisation, and has the ability for your to define your own as well (if you are into that sort of thing). For the API of our how to use them, see this page. The listed ones are:

  • GradientDescentOptimizer
  • AdagradOptimizer
  • MomentumOptimizer
  • AdamOptimizer
  • FtrlOptimizer
  • RMSPropOptimizer

Other optimisation methods are likely to appear in future releases of TensorFlow, or in third-party code. That said, the above optimisations are going to be sufficient for most deep learning techniques. If you aren’t sure which one to use, use GradientDescentOptimizer unless that is failing.

Plotting the error

We can plot the errors after each iteration to get the following output:

The code for this is a small change to the above. First, we create a list to store the errors in. Then, inside the loop, we explicitly compute both the train_op and the error. We do this in a single line, so that the error is computed only once. If we did this is separate lines, it would compute the error, and then the training step, and in doing that it would need to recompute the error.

Below I’ve put the code just for below the tf.global_variables_initializer() line from the previous program - everything above this line is the same.

errors = []
with tf.Session() as session:
    session.run(model)
    for i in range(1000):
        x_train = tf.random_normal((1,), mean=5, stddev=2.0)
        y_train = x_train * 2 + 6
        x_value, y_value = session.run([x_train, y_train])
        _, error_value = session.run([train_op, error], feed_dict={x: x_value, y: y_value})
        errors.append(error_value)
    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

import matplotlib.pyplot as plt
plt.plot([np.mean(errors[i-50:i]) for i in range(len(errors))])
plt.show()
plt.savefig("errors.png")

You may have noticed that I take a windowed average here – using np.mean(errors[i-50:i]) instead of just using errors[i]. The reason for this is that we are only doing a single test inside the loop, so while the error tends to decrease, it bounces around quite a bit. Taking this windowed average smooths this out a bit, but as you can see above, it still jumps around.

Exercises

1) Create a convergence function for the k-means example from Lesson 6, which stops the training if the distance between the old centroids and the new centroids is less than a given epsilon value.

2) Try separate the a and b values from the Gradient Descent example (where w is used).

3) Our example trains on just a single example at a time, which is inefficient. Extend it to learn using a number (say, 50) of training samples at a time.

Stuck? Get the exercise solutions here

If you are looking for solutions on the exercises, or just want to see how I solved them, then our solutions bundle is what you are after.

Buying the bundle gives you free updates for life - meaning when we add a new lesson, you get an updated bundle with the solutions.

It's just $7, and it also helps us to keep running the site with free lessons.

Grow your business with data analytics

Looking to improve your business through data analytics? Are you interested in implementing data mining, automation or artificial intelligence?

This book is the ultimate guide to getting started with using data in your business, with a non-technical view and focusing on achieving good outcomes. We don't get bogged down by technical detail or complex algorithms.

For additional offers, including a premium package, see this page.

Get updates

Sign up here to receive infrequent emails from us about updates to the site and when new lessons are released.



* indicates required

You can also support LearningTensorFlow.com by becoming a patron at Patreon. If we have saved you trawling through heavy documentation, or given you a pointer on where to go next, help us to create new lessons and keep the site running.

You'll also get access to extra content and updates not available on LearningTensorFlow.com!

Coming soon!

Note: despite the title, this book has no relationship to LearningTensorFlow.com

Learn More!