Artificial Intelligence and Computer Vision are everywhere around us today: asking Siri to schedule the next financial meeting, using facial recognition to unlock our phone, etc. Artificial intelligence, together with Computer Vision, are prominent topics of the Tech Industry. AI relies on many different tools and techniques to imitate Human Intelligence and recreate it with various algorithms applied on different devices. Computer vision is a field of IT that focuses on machines’ ability to analyze and understand images and videos, and it goes through the task of image recognition in machine learning.
What is image recognition?
Image recognition is a mechanism used to identify an object within an image and to classify it in a specific category, based on the way human people recognize objects within different sets of images.
How does image recognition work for humans?
When we see an object or an image, we, as human people, are able to know immediately and precisely what it is. People class everything they see on different sorts of categories based on attributes we identify on the set of objects. That way, even though we don’t know exactly what an object is, we are usually able to compare it to different categories of objects we have already seen in the past and classify it based on its attributes. Let’s take the example of an animal that is unknown to us. Even if we cannot clearly identify what animal it is, we are still able to identify it as an animal.
People rarely think about what they are observing and how they can identify objects, it completely happens subconsciously. People aren’t focused on everything that surrounds them all the time. Our brain has been trained to identify objects quite easily, based on our previous experiences, that is to say, objects we have already encountered in the past. We do have an extraordinary power of deduction: when we see something that resembles an object we have already seen before, we are able to deduce that it belongs to a certain category of items. We don’t necessarily need to look at every part of an image to identify the objects in it. As soon as you see a part of the item that you recognized, you know what it is. We usually use colors and contrasts to identify items.
For humans, most image recognition works subconsciously. But it is a lot more complicated when it comes to image recognition with machines.
How does image recognition work with machines?
Machines only recognize categories of objects that we have programmed into them. They are not naturally able to know and identify everything that they see. If a machine is programmed to recognize one category of images, it will not be able to recognize anything else outside of the program. The machine will only be able to specify whether the objects present in a set of images correspond to the category or not. Whether the machine will try to fit the object in the category, or it will ignore it completely.
For a machine, an image is only composed of data, an array of pixel values. Each pixel contains information about red, green, and blue color values (from 0 to 255 for each of them). For black and white images, the pixel will have information about darkness and whiteness values (from 0 to 255 for both of them).
Machines don’t have a look at the whole image; they are only interested in pixel values and patterns in these values. They simply take pixel patterns of an item and compare them with other patterns. If two patterns are close enough, the machine will associate them and recognize the second pattern as something it has already encountered in the past. In that sense, what is happening is the machine will look for groups of similar pixel values across images and will try to place them in specific image categories.
It is very rare that a program recognizes an image at 100%. Pixel patterns are very rarely 100% the same when comparing them. Solving these problems and finding improvements is the job of IT researchers, the goal being to propose the best experience possible to users.
Practicing Image recognition with machine learning
The goal of image recognition is to identify, label and classify objects which are detected into different categories. Object or image recognition is a whole process that involves various traditional computer vision tasks:
Image classification: labeling an image and creating categories.
Object localization: identifying the location of an object in an image, by surrounding it with a bounding box.
Object Detection: determining the presence of objects with the help of bounding boxes and categorizing it within the class it belongs to.
Object Segmentation: distinguishing the various elements. Identify and locate each and every item of the picture. Segmentation doesn’t use bounding boxes but highlights the contour of the object in the image.
For the past few years, this computer vision task has achieved big successes, mainly thanks to machine learning applications.
Processes and Models
In order to go through these 4 tasks and to complete them, machine learning and image recognition systems do require going through a few important steps.
Set up, Training and Testing
First of all, the machine has to know exactly what it has to look for. Thus, it is necessary to give it the parameters you decide to work on. Defining the dimensions of bounding boxes and what elements are inside is crucial. To do so, the machine has to be provided with some references, which can be pictures, videos or photographs, etc. These elements will allow it to be more efficient when analyzing future data. This will create a sort of data library that will then be used by the Neural Network to distinguish the various objects. A Neural Network is composed of multiple artificial neurons. These neurons are meant to imitate the human brain. It works with a set of various algorithms also inspired by the way the brain functions. If we want the image recognition model to analyze and categorize different races of dogs, the model will need to have a database of the various races in order to recognize them.
Second, the model needs to go on a training phase. The dataset needs to be entered within a program in order to function properly. And this phase is only meant to train the Convolutional Neural Network (CNN) to identify specific objects and organize them accurately in the correspondent classes.
Before an AI model is being used, it needs to be thoroughly tested. To do so, it is necessary to propose images that were not part of the training phase. Based on whether or not the program has been able to identify all the items and on the accuracy of classification, the model will be approved or not.
Hereafter are some of the most popular Image Recognition with Machine Learning Models and how they work.
Support Vector Machines (SVM)
SVM models use a set of techniques in order to create an algorithm that will determine whether an image corresponds to the target object or if it does not. From the dataset it was set with, the SVM model is trained to separate a hyper plan into several categories. During the process, depending on the pixel values, the objects are being placed in the hyper plan their position predicts a category based on the category separation learned from the training phase.
Bag of Features Models
This bag of features models takes into account the image to be analyzed and a reference sample photo. Then, the algorithm in the model tries to match pixel patterns from the sample photo with some parts of the target picture to analyze.
This is one of the most famous ones used for facial recognition. It was used even before using CNNs. It scans the faces of people, extracts some of the features from the faces, and classifies them. It also uses a boosting algorithm which is meant to help have a much more accurate classification.
Convolutional Neural Networks
We have dealt with CNN earlier in this article. But it is necessary to go a little deeper with this concept.
Machine learning relies on the things the Human Brain gave it. It is mainly supervised by people, first when it comes to delivering the set of the reference images, to training the machine into distinguishing the objects and testing the method. CNN is a specific model architecture from Deep Learning techniques. CNN algorithm allows machines to detect and classify with quite an impressive precision all of the objects which are observed in a picture.
This type of algorithm works with different layers of perception. It is often hard to interpret a specific layer role in the final prediction but research has made progress on it. We can for example interpret that a layer analyzes colors, another one shapes, a next one textures of the objects, etc. At the end of the process, it is the superposition of all layers that makes a prediction possible.
Popular Image recognition Algorithms
Deep Learning has shown to be extremely efficient for detecting objects and classifying them. Different approaches are available and each has their own characteristics. Here are three of these proceedings.
Faster Region-based CNN (Faster RCNN)
Faster RCNN is a Convolutional Neural Network algorithm based on a Region analysis. When analyzing a new image, after training with a reference set, Faster RCNN is going to propose some regions in the picture where an object could be possibly found. When the algorithm detects areas of interest, these are then surrounded by bounding boxes and cropped, before being analyzed to be classified within the proper category. Why is it called Faster RCNN? Because by proposing regions where objects might be placed, it allows the algorithm to go much faster since the program does not have to navigate throughout the whole image to analyze each and every pixel pattern.
Single Shot Detector (SSD)
Using the Single Shot Detector algorithm is directly linked to RCNN. When identifying and drawing bounding boxes, most of the time, they overlap each other. This is mainly why SSDs are used. To prevent these boxes from overlapping, SSDs use a grid with various ratios to divide the image. Then if we observe a box being placed on top of another one, for example, because the system detected a girl in front of a car, the algorithm proposes to create two different anchor boxes, in order to separate the two items. That way, the picture is divided into different feature plans and is treated separately, and the machine is able to handle the analysis of more objects. This technique reveals to be very successful, accurate, and can be executed quite rapidly.
You Only Look Once (YOLO)
As the name of the algorithm might suggest, the technique processes the whole picture only one-time thanks to a fixed-size grid. It looks for elements in each part of the grid and determines if there is any item. If so, it will be identified with abounding boxes and then classify it with a category. Looking at the grid only once makes the process quite rapid, but there is a risk that the method does not go deep into details. The results are less accurate than with the SSD method.
Programming Image recognition
Some accessible solutions exist for anybody who would like to get familiar with these techniques. An introduction tutorial is even available on Google on that specific topic.
Various methods are used to detect items in a picture and classify them. But how do we apply them to our devices?
Programming with Python language
Python is an IT coding language, meant to program your computer devices in order to make them work the way you want them to work. One of the best things about Python is that it supports many different types of libraries, especially the ones working with Artificial Intelligence. Image detection and recognition are available with Python.
To start working on this topic, Python and the necessary extension packages should be downloaded and installed on your system. Some of the packages include applications with easy-to-understand coding and make AI an approachable method to work on. It is recommended to own a device that handles images quite effectively. We are talking about good quality graphics cards for instance. The next step will be to provide Python and the image recognition application with a free downloadable and already labeled dataset, in order to start classifying the various elements. Finally, a little bit of coding will be needed, including drawing the bounding boxes and labeling them.
Application Programming Interface (API)
An API is an application meant to create a link between two different software, in order to exchange data and/or functionalities. Regarding image recognition, this solution is mainly used to get picture data from a Cloud API such as AWS Cloud from Amazon. That way, you can get a wide library of image references.
Programming item recognition using this method can be done fairly easily and rapidly. That way, you can deploy the program within a short period of time. But, it should be taken into consideration that choosing this solution, taking images from an online cloud, might lead to privacy and security issues. This process should be used for testing or at least an action that is not meant to be permanent.
Contrarily to APIs, Edge AI is a solution that involves confidentiality regarding the images. The images are uploaded and offloaded on the source peripheral where they come from, so no need to worry about putting them on the cloud.
Edge AI is very often used with real-time videos. In most cases, it will be used with connected objects or any item equipped with motion sensors.
Some online platforms are available to use in order to create an image recognition system, without starting from zero. If you don’t know how to code, or if you are not so sure about the procedure to launch such an operation, you might consider using this type of pre-configured platform.
The different fields of application for image recognition with machine learning
Nowadays, Computer Vision and recognition are always around us. From unlocking your phone with your face in the morning to coming into a mall to do some shopping. Many different industries have decided to implement Artificial Intelligence in their processes.
Face analysis is a major recognition application. It is used by many companies to detect different faces at the same time, in order to know how many people there are in an image for example. Face recognition can be used by police and security forces to identify criminals or victims. Face analysis involves gender detection, emotion estimation, age estimation, etc.
The need for businesses to identify these characteristics is quite simple to understand. It allows them to analyze precisely who their customers are. That way, a fashion store can be aware that its clientele is composed of 80% of women, the average age surrounds 30 to 45 years old, and the clients don’t seem to appreciate an article in the store. Their facial emotion tends to be disappointed when looking at this green skirt. Acknowledging all of these details is necessary for them to know their targets and adjust their communication in the future.
Face analysis is also very much used for identification. Apple recently developed a way to unlock your phone with your face. They even developed a method to do it without taking off your surgical mask.
Health and Medicine
Treating patients can be challenging, sometimes a tiny element might be missed during an exam, leading medical staff to deliver the wrong treatment. To prevent this from happening, the Healthcare system started to analyze imagery that is acquired during treatment. X-ray pictures, radios, scans, all of these image materials can use image recognition to detect a single change from one point to another point. Detecting the progression of a tumor, of a virus, the appearance of abnormalities in veins or arteries, etc.
Farmers’ daily lives are far from being easy. To keep taking good care of both their animals and their plantations, they need to monitor them both.
Monitoring their animals has become a comfortable way for farmers to watch their cattle. With cameras equipped with motion sensors and image detection programs, they are able to make sure that all their animals are in good health. They can also monitor animal births. Farmers can easily detect if a cow is having difficulties giving birth to its calf. They can intervene rapidly to help the animal deliver the baby, thus preventing the potential death of two animals.
Farmers also grow their own plants, mainly to feed their cattle. To see if the fields are in good health, image recognition can be programmed to detect the presence of a disease on a plant for example. The farmer can treat the plantation rapidly and be able to harvest peacefully.
Security and Safety
Security and safety are two major concerns in today’s society. Thanks to image recognition and detection, it gets easier to identify criminals or victims, and even weapons. In an airport for example, where safety is crucial. All-day long, security agents are scrutinizing screens. Helped by Artificial Intelligence, they are able to detect dangers extremely rapidly. When a piece of luggage is unattended, the watching agents can immediately get in touch with the field officers, in order to get the situation under control and to protect the population as soon as possible. When a passport is presented, the individual's fingerprints and face are analyzed to make sure they match with the original document.
Insurance companies are also using recognition technologies. When somebody is filing a complaint about the robbery and is asking for compensation from the insurance company. The latter regularly asks the victims to provide video footage or surveillance images to prove the felony did happen. And very often, the thief is caught on camera and can be identified. Sometimes, the guilty individual gets sued and can face charges thanks to facial recognition.
Online stores are experiencing a boom since the beginning of the COVID-19 pandemic. They managed to develop their activities exponentially thanks to various elements.
One of the recent advances they have come up with is image recognition to better serve their customer. Many platforms are now able to identify the favorite products of their online shoppers and to suggest them new items to buy, based on what they have watched previously.
On another note, some new applications propose their users simply snap a picture of an item found on somebody they have met in the street, in order to find a store that has a similar or the same item available for purchase. The algorithm is then able to give a list of places where you can buy the shoes your friend was wearing today.
Improvements made in the field of AI and picture recognition for the past decades have been tremendous. There is absolutely no doubt that researchers are already looking for new techniques based on all the possibilities provided by these exceptional technologies.
Discover out complete guide on image & video annotation
Image & video : Image and Video Annotations: What is a Bounding Box?