Nowadays, a growing number of industries use tools from Artificial Intelligence to manage and develop their activities and businesses. Image Recognition is a key application in Computer Vision, a specific field of AI. The automotive industry uses it on self-driving cars. The electronics industry uses it for facial detection. Even the farming industry uses Image Recognition to monitor their cattle and plantations to make sure both are in good health. Medical imaging uses it to detect health problems and abnormalities which, if seen on time, could help save many lives. Although used by more and more activities, Image Processing is more complex than one might think it is.
Image Recognition, a complex Computer Vision task
Computer Vision requires the setup and execution of several tasks to be functional. Image Recognition is one of them. It works with different techniques meant to locate, identify and categorize objects featured in a set of images (pictures or videos). It involves an entire process of four complementary tasks which are listed down below.
Image Classification: Setting up a data set and labeling classes
Before building an effective algorithm to detect and recognize items in a picture, you need to set the base of images and data the system will have to work on. It means that you first have to implement your system with a pre-configured dataset, or you can create a brand new one. With a pre-configured dataset, you download a set of various images which will be used as a reference, and which is already labeled depending on the objects identified in the pictures. When the dataset is ready, training and testing phases are required to make sure the machine will be able to process the information.
The TensorFlow website delivers a thorough tutorial on its learning platform. It indicates all the steps which have to be followed to install, set up, train, and test your Image Classification system.
Object Localization: Indicating the location of an object
Once all the labeling and categorization are done, the system needs to go through another phase which is localizing the place where the item is in the picture.
Unlike the human eye, machines don’t analyze a photograph, an image, or a video extract as a whole. They usually go through every single pixel and establish pixel patterns that they keep a memory of.
Most of the time, the algorithm creates a proportion grid based on the various pixel patterns which can be observed. The picture is then divided into different parts in order to locate the items more easily.
Object Detection: Assigning the localized object with the proper label
The elements have been found in the picture. Now it is time for the algorithm to surround them with bounding boxes and to label them with the proper label, previously configured during the Image Classification task.
Object Detection should not be confused with Object Recognition, which refers to the whole process we are describing here.
Object Segmentation: Highlighting the contour of the object in the image
Object Segmentation goes a bit further down the road, after Object Detection. Instead of surrounding the objects with bounding boxes, the pixels which have been identified as part of the items will be outlined and highlighted.
These four different steps are crucial if a company wants to get the best performance and accuracy from the technique.
An Artificial Intelligence task that requires some supervision
The basis of an effective system needs to be an absolutely perfect setup and training, this is why it needs to go through a huge process, including Supervised Learning.
What is supervised learning?
Supervised Machine Learning refers to an AI learning method in which somebody teaches the machine how to use correctly the tools it is given. The program first needs to have a base of references and proper labels to identify and categorize the objects.
This methodology implies input variables (here, pictures and labeled categories) and an output variable (detecting and categorizing an item). Let’s take an example. You enter various images of cars (input variables) and you want the machine to let you know if there is any bicycle in them (the output).
How does it work?
The data scientist in charge of supervising the training phase of the program is uploading a new set of data for the system to practice on.
When setting it up, he or she will perform data augmentation in order to propose a wider database. Data augmentation is a way to increase the number of reference images. On a picture, various modifications are made such as grey scaling a colored picture, changing the orientation of a photograph, blurring it, or translating it. Each modification creates a new entry in the database.
The more entries there are in it, the more the technique will be able to perform pixel pattern recognition. Going through data augmentation gives more chances to the system to be accurate in the results. It will allow it to compare more efficiently a picture it has never encountered with the ones it has learned from its database.
When entering the data within the system, the developer also has to create labels affected to objects which are pictured on the reference set. Labeling the classes is going to help train the algorithm into knowing what the objects are, in order to classify the items it will see in the future. To put it simply, a supervised learning technique deals with already labeled data and then produces correct outcomes from it.
Why does Image Recognition need training and testing with supervision?
Artificial Intelligence is inspired by the performances and abilities of the human brain. To set up such a complex algorithm supposed to imitate someone’s brain, it is quite normal for the system to be trained with the help of a human being. This is all supervised machine learning is about. The goal of such an operation is to train the system into predicting the output of the future data it will be submitted to.
The data scientist in charge of building an important database with the corresponding labels will probably allow the system to get better results. The results will indeed be as accurate as they can be since the data has been entered and pre-configured by somebody who knows exactly what the pictures depict.
All the results from training sessions are seriously scanned by the teacher responsible for the learning of the machine. The researcher considers the training is over, only when the program has reached a certain level of acceptance, based on various parameters.
Supervised learning is mainly used to work on two issues: Classification and Regression. Classification issue concerns the output is a category (for example “green” or “white” or “tumor” and “no tumor”). A regression problem is when the output is a value, not a category (“money” for example). The next point of our article gives you five examples of Supervised Machine Learning Models.
Five Supervised Machine Learning Examples to choose from when working with Image Processing
When building up a Machine Learning model, there are plenty of different processes you can use, depending on what your goal is. Here are five of the most popular ones.
Linear Regression is a method based on all the collected data coming from the database in order to build up a “medium line”. According to the data extracted from the pictures, the results will be placed with points on a graph composed of x and y coordinates. An average line will be calculated by the algorithm and will be traced on the graph right in the middle of the various points. The purpose of this kind of technique is to model the relationship between a known set of variables (the reference images the system has) and unknown variables (the images which will be proposed to the machine in the future. This relationship gives the opportunity to predict the position of the next data in the graph.
Linear Regression needs to be supervised seriously because it needs to know what are the exact variables the machine has to work with. This is where the developer comes into play. He or she is responsible for giving the machine an exploitable database with accurate labels. The data have to comply with various rules, and be as complete as possible, to allow the results to be as accurate as possible. The developer needs to prepare the input, which has to be of high enough quality and to realize a data augmentation to feed the algorithm.
Decision Trees are a very popular method for developers and data scientists. The technique is quite accurate and very appreciated by its users. It is based on the same decision process a human being will have. Subconsciously, we make more or less complex decisions, based on a lot of criteria. We usually go through all kinds of questions in our mind and answer them with <em>yes</em> or <em>no</em> until we have the responsibility to our doubts before making the final decision. Decision Trees basically function the same way as the human brain functions. This algorithm is particularly appreciated because of the fact that, unlike many other models, this one is understandable by everybody. Machine Learning uses can be quite overwhelming for somebody who has never heard about it. A decision tree’s ability to be understood by anybody is a great advantage.
This system uses the database to build up a set of questions, meant to predict a class of items. Depending on the characteristics of the items, the answered questions will lead to deducing or predicting the class of the object. The build-up of questions creates complex decision trees which imply the creation of more labels to categorize the products properly. Questions asked are usually very simple, and when being layered, they allow the categorization of quite a wide range of items.
Decision trees are supervised by an agent who is in charge of creating a dataset with labels. During the training phase, he or she supervises the accuracy of the results the process came out with. If errors or faults are noticed, the developer can easily identify where the problem was, show it to the project manager and correct the failure.
Using Random Forest algorithm relies on the creation and consultation of multiple decision trees to come out with the best possible result.
Random Forest is based on various sub-datasets and different classes created by a supervisor agent. Each sub-set is analyzed through a separate decision tree. This multiplication of sets leads to a variety of outputs that the algorithm is going to take into account. This technique easily avoids overfitting, but the process itself can be slower as the analysis relies on more data.
Dealing with the random forests has shown much more satisfactory results if the decision trees are diverse and well pre-configured.
The agent needs to make sure that the sub-sets are exploitable, and that data augmentation has been well done.
Support Vector Machines
Support Vector Machines or SVMs create a hyper-plane with x and y coordinates. They also build a linear boundary which will then help with proceeding to the separation and classification of the items. The algorithm uses the data the agent has entered into the system, together with the labels assigned to the various classes. After analysis, the system divides the outputs into two parts thanks to the linear separation and depending on the results which are obtained during training.
SVMs need help from agents first to set everything up and then to code the Kernel Function the process will use for the analysis. SVMs use Kernel Functions in order to get interesting results. Pre-configured Kernels exist, but if the developer or the users want to add a new variable, the agent will need to focus on this point as well. As for any other methodology, SVMs need to be trained. With the supervision of somebody, it will be easier to detect and correct errors or lack of accuracy for instance.
Convolutional Neural Networks
Convolutional Neural Networks or CNN uses artificial neural networks. These are inspired by human neurons. Researchers have developed these networks in order to copy the way our visual cortex works. CNN's work with specific frameworks. They are made out of a set of convolutional and pooling layers, both of these applied to the input. The first convolutional layer is applied to several parts of the input and analyzes the pixel patterns from these parts. After this, a pooling layer comes in and acts like a cleanser, meant to clear the input and the results from the first application. A second convolutional layer is applied to larger parts of the input to start working on them. Then another pooling layer, etc. It goes on with as many layers as can be required. In the end, a final layer meant to flatten out all the results is applied in order to get the outputs, allowing the objects to be classified.
Such a complex architecture can lead to confusion and failures. This is why the monitoring and the supervision of an agent can be useful. First for the setup and second for the training. In case of errors, or failures, the data scientist can intervene quite rapidly and efficiently in order to achieve the targeted accuracy.
Is Supervised Learning reliable and accurate?
Using a supervised Machine Learning Technique requires taking into account a lot of different variables. All the algorithms mentioned above do not have the same results. Choosing one is not an easy thing to do. Users have to analyze the situation they are in and the reasons why they want to use this specific tool.
Benefits and Disadvantages
When we use supervised learning methods, we have to take into account that the method has been configured by a human being. The judgments the machine will make can be very close to the ones somebody would have made in real life, as the basis for decisions is made by the human brain. Techniques which are supervised by the hand of somebody are very likely to be accurate. Most of the time, people use them for their trustworthiness. The fact that some of them can be revised and understood by people who don’t know anything about Machine Learning (including Decision Trees) adds confidence to people who choose to work with these. Let’s only keep in mind that supervised training can also be subject to errors potentially made by the developer when setting up the program and/or the dataset: mistakes do happen. But a correction is still available with supervised learning.
It is necessary to notice that these Supervised learning algorithms take a lot of time, first when entering all the data manually and setting it up with the corresponding labels. Going through the whole process of a random forest technique for example is very time-consuming, as the machine has to analyze many different sets of images and compile all the results before giving its outputs. Unsupervised machine learning does not have to deal with that time-consuming issue, since it does the formatting of the pictures automatically and does the labeling of classes on its own.
Supervised learning is not the only method used by IT scientists.
Other Machine Learning Models
It is important to remind that supervised learning models are very appreciated methods in Artificial Intelligence. But other types of models have been developed by researchers and implemented in the field of Computer Vision.
When Supervised methods require having all data and class labels set, unsupervised methods are able to process unlabeled data. Providing only the input works just fine for them. The desired output is not mentioned either. The goal of unsupervised programming is to divide and even break down the structure of each image, in order to understand its full patterns.
No teacher is provided neither in order to monitor the Image Processing and check the occurring of any failures, nor to check if the answers and outputs are correct.
Usually, unsupervised learning algorithms are used to gather information from customers in order to detect their purchase habits and to offer new items or propose something that the customer might enjoy, based on what he or she has shown interest in. It allows the system to group the types of customers.
It is also used to suggest people into buying certain objects. The process has previously noticed that somebody who bought the dress you are looking at, also purchased this pair of shoes and that bag.
It is the perfect mix of supervised and unsupervised learning methods. It both relies on labeled and unlabeled data. It only needs a few labeled classes, suggestions, and exampled labels to function.
Most of the time, this one is surprisingly the result of a user who started labeling all the data, and who realized it was way too time-consuming and expensive to set up and decided to let the machine analyze the rest of it.
This hybrid model is actually very successful regarding the tasks such as facial recognition and object detection. The YOLO (You Only Look Once) model recently adopted this learning method to be more effective and accurate. Facial recognition involves a multitude of different inputs. The fact that the algorithm is both having already labeled and unlabeled data is a great solution as it can rely on the variety of faces presented in the process.
Supervised learning or Unsupervised learning: which option is best for me?
These two processes both have advantages and limits. Even though users tend to want to control the entirety of Image Processing, all the projects are different from each other. This is why you might want to consider some aspects before choosing your solution:
The first task for Image Processing is to gather labeled or unlabeled data. If you choose to build up your own dataset and label each input, you will need to have a qualified and dedicated person to fulfill this task. If you plan on using an unlabeled set of data, you will not need help from anybody.
You also need to analyze what the purpose of your actions is. If you want to work on Classification, or Regression for example. Do you want your machine to recognize faces, or just to count the number of people coming in and out of your store in order to analyze your customers’ habits?
Lastly, you need to know about the characteristics your algorithm is going to be working on. For instance, the number of features it will have to go through, and the volume of the input. If there is too much data, will the system be able to support all of it? Do you plan on formatting all the inputs?
All of these points must be thoroughly analyzed in order to find your best fitting. One is highly accurate because handled and directed by the hand of somebody, while the other one is not very transparent regarding how data is processed and represents a higher risk of inaccuracy.
The future of Supervised Learning
Supervised learning remains one of the preferred machine learning methods regarding Image Recognition and Classification. But depending on its use, the learning approach will not be the same for a company and for its neighbor. For the past few years, the rise of semi-supervised learning is changing everything. This combining process is quite interesting for some users. It mixes both supervised and unsupervised methods to create a brand new one, dealing with the same problems and reaching the same level of accuracy and trustworthiness. Undoubtedly, this new practice approach has some beautiful days ahead.
A new method, coming directly from semi-supervised learning, is currently shaking the field of Image Classification: one-shot learning. This Object Categorization approach deals with hundreds of various training samples. It basically takes them by pairs and tells whether the objects in the images are the same or different. It is based on the ability human people have to learn objects and associate them to objects they have already seen in the past. Based on this observation, the only thing the one-shot learning process has to go through is learning during the setup phase only a few different categories in order to keep them in memory and assimilate new inputs fairly easily.
IT researchers are always looking for new ways to improve our daily lives and the way our businesses function. The health industry, food industry, e-commerce… Many fields of activity are already taking advantage of Computer Vision and Machine Learning tools provided us. We might notice in a foreseeable future the emergence of new techniques, technologies that could benefit even more activities.
Discover our complete guide on image & video annotation
Image & video : Image and Video Annotations: What is a Bounding Box?