Image Classification using Squidify Vision

Terry Tham
Jan 23, 2022
3 min read

Classification based on deep learning is a method, in which an image gets a set of confidence values assigned. These confidence values indicate how likely the image belongs to each of the distinguished classes.

With Squidify Vision, users can build up a classification model from scratch specifically for their demanding application. In this article, we are going to create and implement 325 bird species classification model.

In general, the workflow consists of 4 basic steps:

Metadata.
Dataset.
Training.
Evaluation.

Metadata

To create a classification model, classes as the identifier of the object are to be distinguished. To train a model to recognize a class, it is recommended to feed as much metadata into the model. To do so, users can add classes and import images of their classes to the system from the classification workflow panel. When the model consists of many classes, the process of importing images could be mundane. However, users can utilize a one-click button to generate classes with respective images data by categorizing the class of images into subdirectories of a folder.

Dataset

In this example, we are using 47332 images with 325 classes of bird species for the training process. All images are 224 x 224 x 3 color images in jpg format. We can define 70% of these images for training and 15% for the validation process.

Convolution neural network and image classification

The Convolution neural network (CNN) consists of a certain number of layers or filters, which are arranged and connected in a specific way. In general, any layer is a building block performing specific tasks. It can be seen as a container, which receives input, transforms it according to a function, and returns the output to the next layer. Thereby different functions are possible for different types of layers. To train a network for a specific task, a loss function is added. There are different loss functions depending on the task, but they all work according to the following principle. A loss function compares the prediction from the network with the given information, what it should find in the image, and penalizes deviations. Now the filter weights are updated in such a way that the loss function is minimized. Thus, training the network for the specific tasks, one strives to minimize the loss (an error function) of the network, in the hope of doing so will also improve the performance measure.

Training

There are two important hyperparameters: The ’learning_rate’ λ, determines the weight of the gradient on the updated loss function arguments (the filter weights), and the ’momentum’ µ within the interval. A visualization is given in the figure below. A too-large learning rate might result in divergence of the algorithm, a very small learning rate will take unnecessarily many steps. Therefore, it is customary to start with a larger learning rate and potentially reduce it during training. With a momentum µ = 0, the momentum method has no influence, so only the gradient determines the update vector.

Example of the ’learning_rate’ and the ’momentum’ during an actualization step. The gradient step: the learning rate λ times the gradient g (λg - dashed lines). The momentum step: the momentum µ times the previous update vector v (µv - dotted lines). Together, they form the actual step: the update vector v (v - solid lines).

Evaluation

A network infers for an instance a top prediction, the class for which the network deduces the highest affinity. When we know its ground truth class, we can compare the two-class affiliations: the predicted one and the correct one.

The classifier infers for a given image class confidences of how likely the image belongs to every distinguished class. Thus, for an image, we can sort the predicted classes according to the confidence value the classifier assigned. The top-k error tells the ratio of predictions where the ground truth class is not within the k predicted classes with the highest probability. In the case of top-1 error, we check if the target label matches the prediction with the highest probability. In the case of top-1 error, we check if the target label matches one of the top 1 predictions (the first label gets the highest probability for this image). The confidence index expresses the affinity of an instance to a class.

Test and Result

After the classification model is trained, downloaded random bird images from google to test the model.

Conclusion

In this example, we were able to build an artificial convolution neural network image classification that recognized different bird species with an accuracy of 79.64% using Squidify Vision. it has showcased that building a custom image classification model from scratch is easy and straightforward, anyone should be able to create their own classification model for their very specific application.

SQUIDIFY