What is Image Annotation in Machine Learning
This ultimate guide covers all the important aspects of image annotation: what is meant by image annotation? How do you annotate an image? What are the different annotation types? What is an image annotation tool? Find out what image annotation is all about, and how it can improve your business.
Introduction
Image annotation is the basis and a requirement behind numerous commercial Artificial Intelligence (AI) products on the market, and it is one of the crucial processes in Computer Vision (CV). It is the critical step to deliver AI in numerous business applications: to automate the processing of vehicle accident images in the insurance sector, to detect vehicles and pedestrians for autonomous vehicles in the transportation sector, to help medical personnel detect cancers on medical images…
Concretely, image annotation is the process of technically affixing labels to an image or a series of images. This process is used in different Machine Learning tasks: to classify images, to detect objects in images, and to segment images. The associated labeling task can then take various forms depending on the objective and the model constraints: from one label for an entire image to multiple labels for every cluster of pixels within that image, with many different possible shapes (bounding boxes, polygons, lines…).
Why is image annotation so important
Image annotation is the key prerequisite to successful computer vision applications and business value delivery. Indeed, labeling provides the knowledge that will be encapsulated into the AI model during model training. Despite the innovation of alternative computer vision fields (e.g., unsupervised learning), supervised learning relying on image annotation remains the most efficient solution to tackle very complex business problems.
The best example of this lies in Elon Musk’s Tesla, where supervised computer vision is at the heart of the autonomous vehicle strategy. Indeed, Andrej Karpathy, Tesla’s Chief AI scientist, explains that the company has been shifting from a radar approach to a pure computer vision approach, where image annotation plays a very important role in making the model robust enough to reach vehicle autonomy. This is done by combining auto-labeling technics with human labeling to reach the required volume of training data.
What are different image annotation applications?
Many current applications leverage image annotation and the most influential use cases spanning the major industries are, as an example, as follows:
Insurance
Insurance companies process a lot of images from accident images to scanned accident reports. The processing of this data can be automated by annotating accidents on car images to detect the severity level, or by correctly extracting handwritten content with a custom optical character recognition (OCR) model.
Illustration of labeling in insurance
Transportation
With the increasing demand for transportation in a fast, accurate and ecological way, computer vision applications have become numerous in the sector, leading to very large image annotation needs. Autonomous vehicles rely on computer vision models trained with correctly labeled images to detect objects in the picture and classify them. Similarly, image annotation of vehicles can be used to estimate city traffic flows.
Illustration of labeling in transportation
Manufacturing
Manufacturing businesses utilize image annotation to provide real-time information about inventory levels within their warehouses. Trained computer models can evaluate stock image data to decide if or when a product might soon be out-of-stock and needs replenishing. In addition, specific manufacturers use image annotation to monitor key infrastructure elements within their plants. Teams digitally label images of their vital equipment components, information which can then be used to detect defects or to automate traditional visual inspections.
Illustration of labeling in manufacturing
Health and Healthcare
AI-powered applications enable to augment of the diagnostic of medical personnel, supporting them in their day-to-day job. For instance, AI can readily examine medical imaging (X-ray, CT scans, MRI…) to identify the probability of potential disease being present. For example, medical teams can train a computer model using a multitude of MRI scans labeled with both cancerous and non-cancerous zones until the computer vision model can accurately learn to differentiate them on its own. Similarly, with more and more glass slide scanning to digital content, pathology detection can be automated thanks to annotating such data points.
Tracking of movements also has a lot of value for medical applications to detect neurological pathologies and facilitate medical recovery. This requires specific annotation (see pose estimation) to enable computer vision models to track human movements.
It is important to note that AI is not intended to substitute for trained and specialized medical advice, but it can be used to add accuracy to critical health determinations.
Illustration of labeling in healthcare
Agribusiness
The agriculture industry utilizes AI, video or image-based, for a myriad of benefits, such as:
Estimating future crop yield,
Evaluating soil content, and
Planning for future agricultural expansion
Developing autonomous vehicles & machinery,
Automating landmarking
One farming business annotates still-shot digital images to distinguish between weeds and crops – right down to the pixel level. This annotated imagery is then used to apply chemical pesticides to those areas only where weeds are growing, rather than sprayed onto the entire field. This process reduces chemical weed spraying, saving significant amounts of money on pesticides yearly.
Labeling in agritech for crop maturity tracking
Finance and FinTech
Banking and finance companies use facial recognition technology to verify the identity of their customers withdrawing money from their ATMs. This is accomplished through what is called a pose points image annotation process, which digitally maps key facial features such as eyes, nose, and mouth. Consequently, facial recognition presents a more direct and precise method of defining identity, reducing the prospect of fraud.
Pose estimation for facial recognition
Learn more!
Discover how training data can make or break your AI projects, and how to implement the Data Centric AI philosophy in your ML projects.
Retail
Image annotation is required to build a computer modeling system to examine an entire product catalog and administer the end user's results. Retailers are also piloting image annotative systems within their stores. These systems periodically scan and manage digital images of product shelves to decide if a product is close to running out of stock, revealing that it requires reordering. These systems can also check and scan barcode images to collect product information using what is known as image classification, which is a key method used for digital image annotation – which will be discussed further below.
Labeling in the retail sector
Drone/aerial imagery
The development of drone technologies has seen applications in various industries, reducing the cost of complex visual inspections for example. This development came with the production of a lot of image & video data, on which AI systems can be applied.
Labeling of aerial imagery
Security and surveillance
Various computer vision use cases are valuable in the sector, for example, to automate the detection of hazardous situations and potentially raise security alerts. For the purpose of these use cases, CCTV images can be used for image annotation.
Security & surveillance labeling
Robotics
The robotics sector relies on computer vision for the autonomous evolution of the robot in its own environment. This requires strong labeling efforts of images to tackle all possible situations that the robot can encounter and deal with particular sector constraints (e.g., depth estimation challenges for object grasp). The applications of computer vision are wide in the sector: from the space industry to industrial and medical applications but also in the military sector.
Object detection for robotic purposes
What are the different types of image annotation?
Types of annotations offered within the Kili Technology's data labeling platform
As described earlier, image annotation is the process of annotating target objects within a digital image’s region of interest. This is performed to train a machine to recognize objects under the same classes in unseen images and visual scenes. However, this method can be quite challenging. That’s because there are different approaches to developing deep learning model architectures and techniques for training a machine to do this.
Indeed, image annotation can be leveraged to fulfill various tasks:
Image classification: the simplest form of image annotation where a class (e.g., presence of a given object) is attributed at the entire image level.
Object recognition & detection: detection of an object in an image, with the identification of the location and the category of the associated object.
Segmentation: refers to dividing the various parts of an image into categories.
Semantic segmentation: a specific subclass of segmentation, where each pixel of a picture is attributed to a given category, hence leading to the establishment of various pixel-based regions in a picture. In this case, multiple objects will be treated as one category.
Instance segmentation: a specific subclass of segmentation, where, as opposed to semantic segmentation, objects are treated individually.
To fulfill the above roles, various image annotation shapes may be required:
Bounding Boxes
Bounding box labeling using Kili Technology
This is a simple yet versatile type of image annotation. And this is the primary reason why this method is among the most widely used techniques for annotating images in a dataset for a computer vision application’s deep learning model. As its name implies, objects of interest are enclosed in bounding boxes.
Polygons
Polygon labeling in the Kili Technology Platform
Polygons are used in place of simple bounding boxes for this image annotation method. This is known to increase model accuracy, in terms of finding the locations of objects within a region of interest in the image. This is also known to improve object classification accuracy. That’s because this technique cleans up and removes the noise around the object of interest, which is the set of unnecessary pixels around the object that tends to confuse classifiers.
Semantic Segmentation
Semantic segmentation labeling using Kili Technology
This is the most precise type of image annotation since the annotation comes in the form of a mask attributing a category to a given object on an image at the pixel level.
3D Cuboids
3D cuboids annotations performed using Kili Technology's data annotation platform
This is an image annotation method that’s commonly used for target objects in 3D scenes and photos. As its name implies, the difference between this method and bounding boxes is that annotations for this technique include depth, and not just height and width.
Line Annotation
Line annotation using Kili Technology
Lines and splines are used for this image annotation method to mark the boundaries of a region of interest within an image that contains the target object. This is often used when regions of interest containing target objects are too thin or too small for bounding boxes. Vector annotation can be a specific subcase of line annotation when the movement information is important for modeling purposes.
Landmark Annotation
Line annotation in Kili Technology
Also known as dot annotation, this method uses dots as annotations around target objects, which are enclosed by the image of individual regions of interest. This is frequently used for finding and classifying target objects surrounded by or containing much smaller objects. Plus, this is often used to mark the outline of the target object.
Pose estimation
Pose estimation in the Kili Technology data labeling platform
Pose estimation combines dots and lines, with a specific knowledge encapsulation related to the order of the associated points. This can be particularly useful in the medical sector to track body movements: typically a human arm movement can be tracked with three main points: shoulder, elbow, and hand.
Focus on a specific case of scanned documents
A specific case of OCR labeling addressed in Kili Technology's data labeling platform
Scanned documents are images, from which we can extract the characters to be processed by a computer. This can be done by relying on Optical Character Recognition (OCR) methods. This type of task also requires annotation of a specific task: where you combine a bounding box to locate a phrase on a document, and a transcription task to extract the associated content.
What do you need to annotate images?
4 key elements are required to start your image annotation task: 1) a diverse set of images, representing reality, 2) trained annotators to fulfill the task, 3) a suitable annotation platform to achieve the project goal and 4) a project management process to avoid pitfalls.
Diverse images
Diversity of the images to label is key for the model to be able to behave correctly. The first diversity to reach is the correct balance in terms of class: if you are trying to detect cars & motorbikes on road images, your model needs to be trained on a significant amount of both of these categories. In addition, diversity among the class is also important to cover all potential situations that the model may encounter in the future. For example, when detecting road vehicles from CCTV, it will be important to have class diversity (cars but also trucks, motorbikes, bicycles…) and external condition diversity (day, night, sun, rain…).
Suitable annotation platform
A suitable tool is key to delivering a labeling project fast & efficiently. Many solutions are available: from in-house development or open source to enterprise platforms. Important dimensions to assess for the selection are:
Productivity
Quality
Project management
Security
ML Ops
See our webinar Data Labeling: What Are my Options? for more in-depth insights on this topic. On-demand replay is available here.
Trained annotators
Trained annotators will do the work, they can be in-house or outsourced depending on the project needs and the complexity of the underlying task. Then, training the workforce is important at the beginning of a project: this has to rely on extensive annotation guidelines including examples & describing edge cases. Then, a test/train project can be used with a review step to make sure the labeling team has the correct understanding of the task.
Labeling workforce expertise level can be raised and maintained thanks to small iteration loops and continuous improvement of labeling guidelines.
Project management process
Managing a labeling project is a very complex task, that needs to come with a dedicated process. This process has to consider the timeline, the team, the requirements, the distribution of the tasks, the manual/automated labeling or the quality management.
See our webinar Data Labeling: Insider Tips on Data Annotation Project Management. On-demand replay available here.
How does Kili support complex image annotation?
Kili’s labeling platform has been designed to overcome the challenges linked to image annotation, relying on key elements:
1. Customizable interfaces for all your data
2. Powerful workflows for fast & accurate annotation
3. Automation tools to speed-up labeling
4. Analytics & Reporting to monitor the progress
5. Labeling mistakes identification & fixing in ML datasets
6. Advanced quality metrics to get quality insights
7. Issue workflows to spot anomalies and solve errors
8. CLI, delegated access to Import & Export data effortlessly
9. Role based access to manage your team at scale
10. Powerfull SDK & API to integrate in your MLOps infrastructure & automate labeling
How long does Image Annotation take?
Image annotation task duration strongly depends on the type of task, and the associated complexity: an image classification job can be very fast (less than 1 minute per image) for basic task when the picture doesn’t require too much analysis to have a first decision.
On the other hand, segmentation jobs are the most complex task when a very high level of precision needs to be reached, at the image pixel level for example. In this case, the annotation task can reach dozens of minutes.
Bibliography/webography
Open source datasets (Kitti dataset from Andreas Geiger and Philip Lenz and Raquel Urtasun, BCCD dataset, Open data commons, ffhq dataset)
Press articles (e.g., Venturebeat.com - Tesla AI Chief explains why self-driving cars don’t need lidar)
An article by Jean Latapy,
Solution engineer @ Kili Technology,
and Kili technology's team