Showing posts with label Computer Vision. Show all posts
Showing posts with label Computer Vision. Show all posts

Build Your First Nueral Network: Basic Image Classification Using Keras

image Classification using keras img

Image classification is one of the most important problem to solve in machine learning. It can provide vital solutions to a variety of computer vision problems, such as face recognition, character recognition, object avoidance in autonomous vehicles and many others. Convolutional Neural Network (CNN), since its inception has been used for image classification and other computer vision problems. It is called convolutional neural network because of convolutional layer. Keras is a high level library which provides an easy way to get started with machine learning and neural networks. It will be used here to implement CNN to classify handwritten digits of MNIST dataset.

Image Classification is  a process to determine which of the given classes an input image belongs to. CNNs represent a huge breakthrough in image classification. In most cases, CNN outperforms other image classification methods and provides near to human-level accuracy. CNN models do not simply spit the class name the input image belongs to, rather it gives a list of probabilities. Each entry in the list shows the likelihood that the input image belong to a certain class. For example, if we have two classes in a dataset of "cats and dogs" images, a CNN model gives us two probabilities. One to show the likelihood or probability of the input image to belong to "dog" class and the other depicts the the probability that the image might belong to "cat" class.

There are four basic parts of any neural network model. 
  1. Network architecture 
  2. Loss function 
  3. Optimizer
  4. Regularizer.

1. Network architecture

Network architecture refers to the organization of layers in the network and the structure of each layer. It also shows the connectivity between the node of one layer to the nodes of next layer. A node is like a basic functional unit used repeatedly in a layer. A CNN model usually has convolutional layers, pooling layers, dropout layers and fully connected layers.

Convolutional layers extract different features, also called activations or feature maps, from images at different levels while pooling layer down samples and summarizes these features. Dropout out layer is a regularization technique which prevents model to overfit the training data.

2. Loss function

Loss function, also called cost function, calculates the cost of the network during each iteration in training phase. Cost or loss of a neural network refers to the difference between actual output and output predicted by the model. It tells how good the network performed during that iteration. The purpose of the training phase is to minimize this loss value. The only way to minimize loss value meaningfully is to change weights in each layer of the network. It is done with the help of optimizer.
Examples of loss functions include Mean Squared Error and Cross-Entropy loss which give best performance at classification problems.

3. Optimizer

An optimizer is basically an optimization algorithm which helps to minimize or maximize an objective function. In neural networks it is used to find minima of the loss function. Based on the loss value and existing weights, gradients are calculated which tell us in which direction (positive or negative) to update the weights and the amount by which the weights are supposed to change. These calculated gradients are propagated back throughout the network by optimizer.
There are different types of optimizers. Few of the popular optimizers are Adam and different variations of Gradient Decent algorithm. Each of these is suitable for different scenarios. However, Adam (adaptive momentum) is widely used for classification problems due to its speed and accuracy in finding local minima of the loss function.

4. Regularizer

Regularizer is not a mandatory component of a neural network but it is a good practice to use one because it prevent model from overfitting. Overfitting means larger generalization error. An overfit model performs extremely accurate on training data. However, it performs poorly on the data that is has never seen before.  There are different regularization techniques such as dropout, L1 and L2 regularization. To prevent our model overfit training data, we will add a dropout layer to it.

That's enough for theory. Let's see the code stepwise.

1. Import keras library:

import keras

2. Load MNIST dataset:

Keras provides an easy to use API to download the basic datasets like: MNIST, Cifar10, Cifar100, Fashion MNIST. It will take just two lines to load the entire dataset in local memory.

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

3. Define some global variables

batch_size = 200
epochs=5
input_shape = [-1, 28,28,1]

4. Pro-process data

In pre-processing, we will only normalize images, convert labels to categorical format (also called one-hot encoding), and reshape images. Normalization brings pixel values in the range of 0-1. It is not necessary but it helps to improve accuracy. However, labels need to be converted to categorical format, because there are 10 classes in MNIST and as we have discussed in introductory section above, CNN gives a list of probabilities.
x_test = x_test/255.0
x_train = x_train/255.0
 
MNIST labels are single digits ranging from 0-9. In one-hot encoding, each digit is converted to an array of 10 values having 1 only at the digit itself as index of the array. For example 2 is converted to [0,0,1,0,0,0,0,0,0,0] and 3 is converted to [0,0,0,1,0,0,0,0,0,0,0].
One-hot encoding actually tells the model that for instance for an image of digit 3, you should give maximum probability at 3rd index. It sounds a little hard but keras has a utils module which saves us time.

y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

CNN consider number of channels too in convolution operations and MNIST image are provided in 28x28 format. All these images are grayscale and it has only one channel so we will convert it to [-1, 28, 28, 1]. -1 here means that reshape all the images in array.
If you don't understand, don't worry about it—Legendary Andrew Ng
x_train = x_train.reshape(input_shape)
x_test = x_test.reshape(input_shape)

5. Build model

Here is where we define our network architecture. Keras' Sequential model API is pretty easy to understand. It creates a model but stacking layers over each other in the order they are provided. All we need to do it to create an object of Sequential class and add layers to it using add method. There is also an option to add layers at the constructor but I prefer to use add method. It gives a clue how the input pass through the network.

model = keras.Sequential()
model.add(keras.layers.Conv2D(6, (3,3), activation=keras.activations.relu,  
input_shape=[28,28,1]))
model.add(keras.layers.MaxPool2D())
model.add(keras.layers.Conv2D(16, (3,3), activation=keras.activations.relu))
model.add(keras.layers.MaxPool2D())
model.add(keras.layers.Conv2D(120, (3,3), activation=keras.activations.relu))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(84, activation=keras.activations.relu))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(10, activation=keras.activations.softmax))

Remember what is an optimizer and loss function? We definitely need optimizer to update weights and loss function to calculate the cost or loss of the network during training phase.
optimizer = keras.optimizers.adam()
model.compile(optimizer=optimizer, loss=keras.losses.categorical_crossentropy,
metrics=['accuracy'])

6. Training

Our model is now ready to enter the training phase. We will call fit function and provide the training data we want our model to fit. There are some other information needed such as batch size, number of epochs and verbose.
model.fit(x_train, y_train, batch_size, epochs, 1)

7. Testing

Once, all the epochs are completed and the training phase ends we evaluates our model to know how good it is at classification.

results = model.evaluate(x_test, y_test, batch_size, 0)
print('{}: {:.2f}, {}: {:.2f}'.format(model.metrics_names[0], results[0],\
model.metrics_names[1], results[1]))

8. Save trained model

In order to use the trained model next time for classification, it needs to be saved because it is insane to retrain a model each time we need it to use.

model.save('model.h5')

To use the already trained and saved model, it is loaded using keras' load_model function. If you have a saved model, you don't need step 5 and 6.

new_model = keras.models.load_model('model.h5')

Note: In this post, I have skipped some details to make things easy to understand. However, we will see those details in upcoming posts.
If you have any issue with the code, feel free to ask in the comments. I will try to reply instantly.

    7 Awesome Examples of Computer Vision


    Computer vision examples
    Photo by David Travis on Unsplash

    Though early experiments in computer vision started from the 1950s and it was initially put to use to distinguish between handwritten and typed text from the 1970s. Today the applications for computer vision have increased exponentially. In this article, we will share with you some of the recent implementation trends of computer vision.

    What's Computer Vision (CV)?

    Computer vision is the use of computers which process visual data and then make conclusions from it or gain understanding about the situation and the surroundings. One of the factors behind the growth of computer vision is the amount of data today which we use subsequently to train and improve computer vision machines.
    We have a bulk amount of visual data in the form of images and videos produced by built-in cameras of our phones alone. However, while visual data can include photos and videos, it can also get information from other sources and detectors. Besides with the massive amount of visual information (over 3 billion pictures are shared online every day), the computing ability requires to examine the information is now accessible and cheaper. Since the field of computer vision has raised with new hardware and algorithms so has the accuracy rates for item identification. In under a decade, today's systems have attained 99 percent accuracy  from 50 percent making them more accurate than humans at quickly reacting to the visual input signal.

    How Can Computer Vision Work?

    One of the main components to understanding all of the capabilities of artificial intelligence is to provide machines the power of vision. To emulate eyesight, analyze and process, machines will need to acquire and comprehend graphics. The growth in achieving this landmark was created learning procedure made possible. It starts with a dataset with advice which aids the system to learn a particular topic. If the goal is to detect videos of cats as it was for Google in 2012, the dataset used by the neural networks should get videos and images with cats as well as examples without cats. Each image needs to be tagged with metadata that indicates the right answer.
    When a neural network operates through signals and data it has found a picture using a kitty; it is the feedback that is received regarding if it was correct or not that helps it improve. Networks are currently using pattern recognition to differentiate distinct pieces of an image. Rather than a programmer specifying the attributes which make like having a tail and whiskers, a cat, the machines learn in the millions of pictures.

    7 Awesome Examples of Computer Vision

     Imagine all the things human sight allows and you can begin to recognize the endless applications for computer vision.

    1. Self-Driving Vehicle

    Computer vision is essential to empower self-driving cars. Manufacturers like Tesla, BMW, Volvo, and Audi use detectors, lidar, radar, and detectors to obtain images so that their automobiles can detect lane markings items, signs and traffic signals to safely drive.

    2. Google Translate app

    All you have to do is to read signs in a language that you don't understand and to point your cellphone's camera towards and let the Google Translate app tell you exactly what it means in your favorite language immediately. This is, using optical character recognition to see the image and augmented reality to overlay an accurate translation.

    3. Facial recognition

    China is definitely on the cutting edge of using facial recognition technology, and they use it for police work, payment portals, security checkpoints at the airport and even to dispense toilet paper and prevent theft of the paper at Tiantan Park in Beijing, among many other applications.

    4. Healthcare

    Considering that 90 percent of all medical data is picture based, there are various applications for computer vision in medication. From allowing new medical diagnostic methods to analyze X-rays, mammography and other scans to monitoring patients to identify issues sooner and assist with surgery. Our health care institutions and professionals and patients will benefit from computer vision now and much more in the future as its rolled out in healthcare.

    5. Profession sports monitoring

    Ball and puck monitoring on sports has been common for a While now, but personal computer vision is also helping play and strategy analysis, player ratings, and performance, and to track the brand sponsorship visibility in sports broadcasts

    6. Agriculture

    At CES 2019, John Deere featured a semi-autonomous combine harvester that uses artificial intelligence and computer vision to examine grain quality since it gets to discover the perfect route through the plants and harvest. There’s also a possibility for computer vision to identify weeds so that herbicides can be sprayed directly on them instead of on the crops. This is expected to reduce the number of herbicides by 90 percent.

    7. Manufacturing 

    Computer vision is helping producers operate more intelligently and effectively in various ways. Maintenance is only one example where equipment is monitored to intervene prior to a breakdown could lead to expensive downtime. Packaging and product quality are monitored, and defective products can also be reduced with computer vision.
    There is already a huge amount of applications for technology and computer vision is still young. As machines and people continue to associate, the workforce that is human will be freed up to focus on tasks because the machines will automate processes that rely on picture recognition.

    Translate