This blog post covers basic example of a Neural Network, using TensorFlow and Keras in Python. The notebook can be also viewed on Github.
TensorFlow and Keras
TensorFlow was developed at Google to use internally for machine learning tasks, and applied to the applications like speech recognition, Search, Gmail, etc. It was made public in 2015 as an open source application. The library is in C++, used with Python API. TensorFlow can be used for various problems like image recognition, language processing, implementation in self-driving cars, etc. There are various alternatives available to TensorFlow such as Theano, and Torch.
We are going to use Keras in this notebook, with Tensorflow as a backend engine. Keras is a high-level wrapper, which can be used both with TensorFlow and Theano. It simplifies common operations. The code is similar to scikit-learn, making it easier to get used to it, while in the background TensorFlow or Theano is used for processing.
The data
In this example we will be looking at MNIST database (a subset of a larger set by National Institute of Standards and Technology). This is a classic dataset containing 60000 training images, 10000 test images, and corresponding training and test labels. The images are handwritten digits, in the shape of 28 x 28 pixels, and divided into 10 categories (from 0 to 9).
The versions
In this example I am using Keras v.2.1.4 and TensorFlow v.1.5.0 with GPU (using NVIDIA CUDA). Running examples on a GPU can speed up the training process.
# To avoid warnings
import warnings
warnings.filterwarnings('ignore')
# Importing keras and tensorflow, and printing the versions
import keras
print('Keras: {}'.format(keras.__version__))
import tensorflow as tf
print('TensorFlow: {}'.format(tf.__version__))
# Importing MNIST data set
from keras.datasets import mnist
# Assigning the data from the data set
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print('Training images shape: {}'.format(X_train.shape))
print('Training labels shape: {}'.format(y_train.shape))
print('Test images shape: {}'.format(X_test.shape))
print('Test labels shape: {}'.format(y_test.shape))
As we can see, the data is in the shape of 28 x 28 pixels. Let’s look at the first ten labels in the data set. Also we are going to visualize the first 10 digits by plotting the pixels information.
# First 10 labels
y_train[:10]
# Importing matplotlib library for plotting
%matplotlib inline
import matplotlib.pyplot as plt
# Defining the figure for plotting, using 'xticks':[] and 'yticks':[]
# to avoid the ticks on axes
fig, axes = plt.subplots(1, 10, figsize=(10, 1),
subplot_kw={'xticks':[], 'yticks':[]},
gridspec_kw=dict(hspace=0.1, wspace=0.1))
# Looping through the images and their corresponding labels
for i, ax in enumerate(axes.flat):
ax.imshow(X_train[i], cmap='binary')
ax.text(0.05, 0.05, str(y_train[i]), transform=ax.transAxes, color='orangered')
Neural Network with Keras and Tensorflow
Models are defined as a sequence of layers. We start by importing Sequential from keras.models library, and then created the Sequential model.
The network below consists of a sequence of two Dense layers. These layers are fully connected. We are using ‘relu’ rectifier activation function on the first layer (more on the relu on Medium). We are using ‘softmax’ as an activation function in the second layer, which will return an array of 10 probability scores, to define which class does the current digit belongs to.
We will flatten the 28 x 28 pixels image to an input vector of a shape 784 pixels.
# Importing Sequential and Dense from keras
from keras.models import Sequential
from keras.layers import Dense
# Creating the model and adding the layers
model = Sequential()
model.add(Dense(784, activation='relu', input_shape=(28 * 28,)))
model.add(Dense(10, activation='softmax'))
Now that we have the model defined, it is time to compile it. Under the covers TensorFlow is used for compiling. We need to specify additional properties required for the neural network.
- A loss function: measures the performance of a model, comparing y_pred to actual y. We will be using ‘categorical_crossentropy’ as the loss function for our model. ‘categorical_crossentropy’ reguires the target to be in a categorical format. In our case we have 10 categories, so the target should be in a format of 10-dimensional vector, all 0s except 1 for the actual index corresponding to the class. List of Loss functions at keras website.
- An optimizer: searches through different weights for the network, and determines how the model will be updated based on the loss function. We will be implementing Adam optimizer. Adam optimizer is an extension to stochastic gradient descent. List of Optimizers at keras website. More on Adam at machinelearningmastery.
- Metrics: is a function to judge the performance of the model. In this case we will collect and report the classification accuracy.
# Compiling the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The data needs to be preprocessed before applying the model to it. First we reshape 28 x 28 pixels into a vector of a shape 784. Then we are normalizing the scale from 0-255 to the range 0-1 by dividing the values by 255. Also we are converting the labels to categorical format. After the data is converted, we will print the first row in the training data to demonstrate converted categorical label.
# Reshaping the data
X_train = X_train.reshape((X_train.shape[0], 28 * 28))
X_test = X_test.reshape((X_test.shape[0], 28 * 28))
# Converting the data to float32 type, and normalizing
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
# Importing to_categorical to convert the labels
from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# Displaying the first row of the training labels, corresponding to '5'
y_train[0]
Once all the above steps are taken, we can fit the training data and evaluate the model on the test data. This model is fit over 5 epochs (the amount of iterations over the data set), and the batch size of 150 (the total number of training samples present in a single batch).
# Fitting the model
model.fit(X_train, y_train, epochs=5, batch_size=150)
# Evaluating the model on the test set
score = model.evaluate(X_test, y_test)
# Printing the accuracy score of the model over the test set
print('\nModel {}: {}'.format(model.metrics_names[1], score[1]))
The model accuracy on the test data turns out to be about 98.1%. There is slight discrepancy between the training accuracy and test accuracy due to some overfitting.
Conclusion
In this notebook we had a quick introduction to Neural Networks using TensorFlow with Keras. We looked at the data import, data manipulation, model creation, modification of parameters, training the model, and applying the model to the test data set.