Skip to main content

Build Your First Nueral Network: Basic Image Classification Using Keras

image Classification using keras img

Image classification is one of the most important problem to solve in machine learning. It can provide vital solutions to a variety of computer vision problems, such as face recognition, character recognition, object avoidance in autonomous vehicles and many others. Convolutional Neural Network (CNN), since its inception has been used for image classification and other computer vision problems. It is called convolutional neural network because of convolutional layer. Keras is a high level library which provides an easy way to get started with machine learning and neural networks. It will be used here to implement CNN to classify handwritten digits of MNIST dataset.

Image Classification is  a process to determine which of the given classes an input image belongs to. CNNs represent a huge breakthrough in image classification. In most cases, CNN outperforms other image classification methods and provides near to human-level accuracy. CNN models do not simply spit the class name the input image belongs to, rather it gives a list of probabilities. Each entry in the list shows the likelihood that the input image belong to a certain class. For example, if we have two classes in a dataset of "cats and dogs" images, a CNN model gives us two probabilities. One to show the likelihood or probability of the input image to belong to "dog" class and the other depicts the the probability that the image might belong to "cat" class.

There are four basic parts of any neural network model. 
  1. Network architecture 
  2. Loss function 
  3. Optimizer
  4. Regularizer.

1. Network architecture

Network architecture refers to the organization of layers in the network and the structure of each layer. It also shows the connectivity between the node of one layer to the nodes of next layer. A node is like a basic functional unit used repeatedly in a layer. A CNN model usually has convolutional layers, pooling layers, dropout layers and fully connected layers.

Convolutional layers extract different features, also called activations or feature maps, from images at different levels while pooling layer down samples and summarizes these features. Dropout out layer is a regularization technique which prevents model to overfit the training data.

2. Loss function

Loss function, also called cost function, calculates the cost of the network during each iteration in training phase. Cost or loss of a neural network refers to the difference between actual output and output predicted by the model. It tells how good the network performed during that iteration. The purpose of the training phase is to minimize this loss value. The only way to minimize loss value meaningfully is to change weights in each layer of the network. It is done with the help of optimizer.
Examples of loss functions include Mean Squared Error and Cross-Entropy loss which give best performance at classification problems.

3. Optimizer

An optimizer is basically an optimization algorithm which helps to minimize or maximize an objective function. In neural networks it is used to find minima of the loss function. Based on the loss value and existing weights, gradients are calculated which tell us in which direction (positive or negative) to update the weights and the amount by which the weights are supposed to change. These calculated gradients are propagated back throughout the network by optimizer.
There are different types of optimizers. Few of the popular optimizers are Adam and different variations of Gradient Decent algorithm. Each of these is suitable for different scenarios. However, Adam (adaptive momentum) is widely used for classification problems due to its speed and accuracy in finding local minima of the loss function.

4. Regularizer

Regularizer is not a mandatory component of a neural network but it is a good practice to use one because it prevent model from overfitting. Overfitting means larger generalization error. An overfit model performs extremely accurate on training data. However, it performs poorly on the data that is has never seen before.  There are different regularization techniques such as dropout, L1 and L2 regularization. To prevent our model overfit training data, we will add a dropout layer to it.

That's enough for theory. Let's see the code stepwise.

1. Import keras library:

import keras

2. Load MNIST dataset:

Keras provides an easy to use API to download the basic datasets like: MNIST, Cifar10, Cifar100, Fashion MNIST. It will take just two lines to load the entire dataset in local memory.

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

3. Define some global variables

batch_size = 200
input_shape = [-1, 28,28,1]

4. Pro-process data

In pre-processing, we will only normalize images, convert labels to categorical format (also called one-hot encoding), and reshape images. Normalization brings pixel values in the range of 0-1. It is not necessary but it helps to improve accuracy. However, labels need to be converted to categorical format, because there are 10 classes in MNIST and as we have discussed in introductory section above, CNN gives a list of probabilities.
x_test = x_test/255.0
x_train = x_train/255.0
MNIST labels are single digits ranging from 0-9. In one-hot encoding, each digit is converted to an array of 10 values having 1 only at the digit itself as index of the array. For example 2 is converted to [0,0,1,0,0,0,0,0,0,0] and 3 is converted to [0,0,0,1,0,0,0,0,0,0,0].
One-hot encoding actually tells the model that for instance for an image of digit 3, you should give maximum probability at 3rd index. It sounds a little hard but keras has a utils module which saves us time.

y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

CNN consider number of channels too in convolution operations and MNIST image are provided in 28x28 format. All these images are grayscale and it has only one channel so we will convert it to [-1, 28, 28, 1]. -1 here means that reshape all the images in array.
If you don't understand, don't worry about it—Legendary Andrew Ng
x_train = x_train.reshape(input_shape)
x_test = x_test.reshape(input_shape)

5. Build model

Here is where we define our network architecture. Keras' Sequential model API is pretty easy to understand. It creates a model but stacking layers over each other in the order they are provided. All we need to do it to create an object of Sequential class and add layers to it using add method. There is also an option to add layers at the constructor but I prefer to use add method. It gives a clue how the input pass through the network.

model = keras.Sequential()
model.add(keras.layers.Conv2D(6, (3,3), activation=keras.activations.relu,  
model.add(keras.layers.Conv2D(16, (3,3), activation=keras.activations.relu))
model.add(keras.layers.Conv2D(120, (3,3), activation=keras.activations.relu))
model.add(keras.layers.Dense(84, activation=keras.activations.relu))
model.add(keras.layers.Dense(10, activation=keras.activations.softmax))

Remember what is an optimizer and loss function? We definitely need optimizer to update weights and loss function to calculate the cost or loss of the network during training phase.
optimizer = keras.optimizers.adam()
model.compile(optimizer=optimizer, loss=keras.losses.categorical_crossentropy,

6. Training

Our model is now ready to enter the training phase. We will call fit function and provide the training data we want our model to fit. There are some other information needed such as batch size, number of epochs and verbose., y_train, batch_size, epochs, 1)

7. Testing

Once, all the epochs are completed and the training phase ends we evaluates our model to know how good it is at classification.

results = model.evaluate(x_test, y_test, batch_size, 0)
print('{}: {:.2f}, {}: {:.2f}'.format(model.metrics_names[0], results[0],\
model.metrics_names[1], results[1]))

8. Save trained model

In order to use the trained model next time for classification, it needs to be saved because it is insane to retrain a model each time we need it to use.'model.h5')

To use the already trained and saved model, it is loaded using keras' load_model function. If you have a saved model, you don't need step 5 and 6.

new_model = keras.models.load_model('model.h5')

Note: In this post, I have skipped some details to make things easy to understand. However, we will see those details in upcoming posts.
If you have any issue with the code, feel free to ask in the comments. I will try to reply instantly.


    1. Basic concepts help us clearly understand what needs to be done in order to improve existing programs and get a more serious result.


    Post a comment

    Popular posts from this blog

    How Big Data Analytics Can Help You Improve And Grow Your Business?

    Big Data Analytics There are certain problems that can only solve through big data. Here we discuss the field big data as "Big Data Analytics". The big data came into the picture we never thought how commodity hardware is used to store and manage the data which is reliable and feasible as compared to the costly sources. Now let us discuss a few examples of how big data analytics is useful nowadays. When you go to websites like Amazon, Youtube, Netflix, and any other websites actually they will provide some field in which recommend some product, videos, movies, and some songs for you. What do you think about how they do it? Basically what kind of data they generated on these kind websites. They make sure to analyze properly. The data generated is not small it is actually big data. Now they analysis these big data they make sure whatever you like and whatever you are the preferences accordingly they generate recommendations for you. If you go to Youtube you have noticed it kn…

    How Computers Understand Human Language?

    How Computers Understand Human Language? Natural languages are the languages that we speak and understand, containing large diverse vocabulary. Various words have several different meanings, speakers with different accents and all sorts of interesting word play. But for the most part human can roll right through these challenges. The skillful use of language is a major part what makes us human and for this reason the desire for computers that understand or speak human language has been around since they were first conceived. This led to the creation of natural language processing or NLP.
    Natural Language Processing is a disciplinary field combining computer science and linguistics. There is an infinite number of ways to arrange words in a sentence. We can't give computers a dictionary of all possible sentences to help them understand what humans are blabbing on about. So, an early and fundamental NLP problem was deconstructing sentences into small pieces which could be more easily…

    The Limits of Artificial Intelligence

    If you are here, it means that you are familiar with term artificial intelligence. Either you have read about it in school or have seen it in sci-fi movies or somewhere else. Talking about the limitations of AI, let me ask you one simple question first, do you know the definition of AI? You might be thinking to answer me with a yes, yes I know what is artificial intelligence. But what if I tell you that AI is a buzzword and it is almost impossible to properly define. It is this way because the definition of artificial intelligence is moving. People don’t call the things AI that they used to call. For example, a problem that seemed too complex to be solved by human and was solved by AI algorithm is no longer a problem of AI. Playing chess, is one of the examples. It was considered the peek level of artificial intelligence back in previous century. Now it hardly fits the criteria for AI. It is presented to the world as a super power that when given to a computer, it magically starts li…