Credits: Forked from TensorFlow by Google
Refer to the setup instructions.
Previously in 1_notmnist.ipynb
, we created a pickle with formatted datasets for training, development and testing on the notMNIST dataset.
The goal of this exercise is to progressively train deeper and more accurate models using TensorFlow.
# These are all the modules we'll be using later. Make sure you can import them# before proceeding further.importcPickleaspickleimportnumpyasnpimporttensorflowastf
First reload the data we generated in 1_notmist.ipynb
.
pickle_file='notMNIST.pickle'withopen(pickle_file,'rb')asf:save=pickle.load(f)train_dataset=save['train_dataset']train_labels=save['train_labels']valid_dataset=save['valid_dataset']valid_labels=save['valid_labels']test_dataset=save['test_dataset']test_labels=save['test_labels']delsave# hint to help gc free up memoryprint'Training set',train_dataset.shape,train_labels.shapeprint'Validation set',valid_dataset.shape,valid_labels.shapeprint'Test set',test_dataset.shape,test_labels.shape
Training set (200000, 28, 28) (200000,) Validation set (10000, 28, 28) (10000,) Test set (18724, 28, 28) (18724,)
Reformat into a shape that's more adapted to the models we're going to train:
image_size=28num_labels=10defreformat(dataset,labels):dataset=dataset.reshape((-1,image_size*image_size)).astype(np.float32)# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]labels=(np.arange(num_labels)==labels[:,None]).astype(np.float32)returndataset,labelstrain_dataset,train_labels=reformat(train_dataset,train_labels)valid_dataset,valid_labels=reformat(valid_dataset,valid_labels)test_dataset,test_labels=reformat(test_dataset,test_labels)print'Training set',train_dataset.shape,train_labels.shapeprint'Validation set',valid_dataset.shape,valid_labels.shapeprint'Test set',test_dataset.shape,test_labels.shape
Training set (200000, 784) (200000, 10) Validation set (10000, 784) (10000, 10) Test set (18724, 784) (18724, 10)
We're first going to train a multinomial logistic regression using simple gradient descent.
TensorFlow works like this:
First you describe the computation that you want to see performed: what the inputs, the variables, and the operations look like. These get created as nodes over a computation graph. This description is all contained within the block below:
with graph.as_default(): ...
Then you can run the operations on this graph as many times as you want by calling session.run()
, providing it outputs to fetch from the graph that get returned. This runtime operation is all contained in the block below:
with tf.Session(graph=graph) as session: ...
Let's load all the data into TensorFlow and build the computation graph corresponding to our training:
# With gradient descent training, even this much data is prohibitive.# Subset the training data for faster turnaround.train_subset=10000graph=tf.Graph()withgraph.as_default():# Input data.# Load the training, validation and test data into constants that are# attached to the graph.tf_train_dataset=tf.constant(train_dataset[:train_subset,:])tf_train_labels=tf.constant(train_labels[:train_subset])tf_valid_dataset=tf.constant(valid_dataset)tf_test_dataset=tf.constant(test_dataset)# Variables.# These are the parameters that we are going to be training. The weight# matrix will be initialized using random valued following a (truncated)# normal distribution. The biases get initialized to zero.weights=tf.Variable(tf.truncated_normal([image_size*image_size,num_labels]))biases=tf.Variable(tf.zeros([num_labels]))# Training computation.# We multiply the inputs with the weight matrix, and add biases. We compute# the softmax and cross-entropy (it's one operation in TensorFlow, because# it's very common, and it can be optimized). We take the average of this# cross-entropy across all training examples: that's our loss.logits=tf.matmul(tf_train_dataset,weights)+biasesloss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits,tf_train_labels))# Optimizer.# We are going to find the minimum of this loss using gradient descent.optimizer=tf.train.GradientDescentOptimizer(0.5).minimize(loss)# Predictions for the training, validation, and test data.# These are not part of training, but merely here so that we can report# accuracy figures as we train.train_prediction=tf.nn.softmax(logits)valid_prediction=tf.nn.softmax(tf.matmul(tf_valid_dataset,weights)+biases)test_prediction=tf.nn.softmax(tf.matmul(tf_test_dataset,weights)+biases)
Let's run this computation and iterate:
num_steps=801defaccuracy(predictions,labels):return(100.0*np.sum(np.argmax(predictions,1)==np.argmax(labels,1))/predictions.shape[0])withtf.Session(graph=graph)assession:# This is a one-time operation which ensures the parameters get initialized as# we described in the graph: random weights for the matrix, zeros for the# biases. tf.global_variables_initializer().run()print'Initialized'forstepinxrange(num_steps):# Run the computations. We tell .run() that we want to run the optimizer,# and get the loss value and the training predictions returned as numpy# arrays._,l,predictions=session.run([optimizer,loss,train_prediction])if(step%100==0):print'Loss at step',step,':',lprint'Training accuracy: %.1f%%'%accuracy(predictions,train_labels[:train_subset,:])# Calling .eval() on valid_prediction is basically like calling run(), but# just to get that one numpy array. Note that it recomputes all its graph# dependencies.print'Validation accuracy: %.1f%%'%accuracy(valid_prediction.eval(),valid_labels)print'Test accuracy: %.1f%%'%accuracy(test_prediction.eval(),test_labels)
Initialized Loss at step 0 : 17.2939 Training accuracy: 10.8% Validation accuracy: 13.8% Loss at step 100 : 2.26903 Training accuracy: 72.3% Validation accuracy: 71.6% Loss at step 200 : 1.84895 Training accuracy: 74.9% Validation accuracy: 73.9% Loss at step 300 : 1.60701 Training accuracy: 76.0% Validation accuracy: 74.5% Loss at step 400 : 1.43912 Training accuracy: 76.8% Validation accuracy: 74.8% Loss at step 500 : 1.31349 Training accuracy: 77.5% Validation accuracy: 75.0% Loss at step 600 : 1.21501 Training accuracy: 78.1% Validation accuracy: 75.4% Loss at step 700 : 1.13515 Training accuracy: 78.6% Validation accuracy: 75.4% Loss at step 800 : 1.0687 Training accuracy: 79.2% Validation accuracy: 75.6% Test accuracy: 82.9%
Let's now switch to stochastic gradient descent training instead, which is much faster.
The graph will be similar, except that instead of holding all the training data into a constant node, we create a Placeholder
node which will be fed actual data at every call of sesion.run()
.
batch_size=128graph=tf.Graph()withgraph.as_default():# Input data. For the training data, we use a placeholder that will be fed# at run time with a training minibatch.tf_train_dataset=tf.placeholder(tf.float32,shape=(batch_size,image_size*image_size))tf_train_labels=tf.placeholder(tf.float32,shape=(batch_size,num_labels))tf_valid_dataset=tf.constant(valid_dataset)tf_test_dataset=tf.constant(test_dataset)# Variables.weights=tf.Variable(tf.truncated_normal([image_size*image_size,num_labels]))biases=tf.Variable(tf.zeros([num_labels]))# Training computation.logits=tf.matmul(tf_train_dataset,weights)+biasesloss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits,tf_train_labels))# Optimizer.optimizer=tf.train.GradientDescentOptimizer(0.5).minimize(loss)# Predictions for the training, validation, and test data.train_prediction=tf.nn.softmax(logits)valid_prediction=tf.nn.softmax(tf.matmul(tf_valid_dataset,weights)+biases)test_prediction=tf.nn.softmax(tf.matmul(tf_test_dataset,weights)+biases)
Let's run it:
num_steps=3001withtf.Session(graph=graph)assession:tf.global_variables_initializer().run()print"Initialized"forstepinxrange(num_steps):# Pick an offset within the training data, which has been randomized.# Note: we could use better randomization across epochs.offset=(step*batch_size)%(train_labels.shape[0]-batch_size)# Generate a minibatch.batch_data=train_dataset[offset:(offset+batch_size),:]batch_labels=train_labels[offset:(offset+batch_size),:]# Prepare a dictionary telling the session where to feed the minibatch.# The key of the dictionary is the placeholder node of the graph to be fed,# and the value is the numpy array to feed to it.feed_dict={tf_train_dataset:batch_data,tf_train_labels:batch_labels}_,l,predictions=session.run([optimizer,loss,train_prediction],feed_dict=feed_dict)if(step%500==0):print"Minibatch loss at step",step,":",lprint"Minibatch accuracy: %.1f%%"%accuracy(predictions,batch_labels)print"Validation accuracy: %.1f%%"%accuracy(valid_prediction.eval(),valid_labels)print"Test accuracy: %.1f%%"%accuracy(test_prediction.eval(),test_labels)
Initialized Minibatch loss at step 0 : 16.8091 Minibatch accuracy: 12.5% Validation accuracy: 14.0% Minibatch loss at step 500 : 1.75256 Minibatch accuracy: 77.3% Validation accuracy: 75.0% Minibatch loss at step 1000 : 1.32283 Minibatch accuracy: 77.3% Validation accuracy: 76.6% Minibatch loss at step 1500 : 0.944533 Minibatch accuracy: 83.6% Validation accuracy: 76.5% Minibatch loss at step 2000 : 1.03795 Minibatch accuracy: 78.9% Validation accuracy: 77.8% Minibatch loss at step 2500 : 1.10219 Minibatch accuracy: 80.5% Validation accuracy: 78.0% Minibatch loss at step 3000 : 0.758874 Minibatch accuracy: 82.8% Validation accuracy: 78.8% Test accuracy: 86.1%
Turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units (nn.relu()) and 1024 hidden nodes. This model should improve your validation / test accuracy.