Data annotation is the process of labelling training data to make it usable in supervised learning tasks. In 2018 the survey What AI can and can’t do (yet) for your business by McKinsey states that the first limitation to AI applications is the lack of labeled data. To tackle this limitation we created Kili technology annotation platform to provide companies with relevant high-quality labeled data. As a matter of fact, while AI allows us to automate more and more human tasks, we cannot get rid of the “human in the loop” when it comes to data annotation. Human data annotation requires an organized workforce and a software to provide the quality and quantity of labeled data for industrial AI applications.
Labeled images are the backbone of AI systems such as self-driving cars and automated medical imagery analysis which now requires tens, hundreds of thousands, even millions of images to train. Therefore the acquisition cost of those data can not be neglected. Previous techniques of annotation such as bounding boxes while cheap limits the performance of deep learning models. Bounding boxes are limited when annotating overlapping entities or non rectangular objects. As shown below the bounding box annotating a crack in a wall covers the entire image ! Nowadays, pixel accurate labelling has become the new norm as it removes most of the noise that bounding boxes introduce in the data. Because annotating an image at pixel level is more time consuming than drawing a bounding box it can cost up to 10x more !
In this article we will focus on image segmentation and offer a comparison of the segmentation tools available at the moment and see how they can reduce annotation time and cost.
Image segmentation is the process of partitioning an image in multiple segments. Every pixel within a segment represents a semantic concept label. Here we present the three task of image segmentation present in the industry:
Instance segmentation: in instance segmentation each individual object of the image is annotated at the pixel level. It is the equivalent of pixel accurate bounding boxes.
Semantic segmentation: in semantic segmentation requires each pixel of an image to be associated to a semantic label without distinction of instances.
Panoptic segmentation: in panoptic segmentation is a combination of instance segmentation and semantic segmentation, each pixels is associated to a semantic label taking into account each instance of objects within the image
In addition to being time consuming, image segmentation is also not safe against human errors especially when taking into account the tiredness of annotators after labelling multiple images !
Tooling for image segmentation
We can class the tools used to perform image segmentation in three categories:
Digital brush and pen
In its most classic form pixel accurate segmentation can be obtained using a digital pen or digital brush that allow the user to manually annotate the different entities of an image. When considering such a tool, a user should verify that the drawn boundaries of objects are automatically adjusted when overlapping. This functionality will save you a lot of time when annotating, as it can be really tedious to perfectly annotate objects with common boundaries.
Deep learning powered tools
Recent progress in deep learning and image segmentation such as the papers Polygon RNN+ and DEXTR, allowed for the creation of deep learning based tools for image segmentation. Those tools allow the user to generate a pixel accurate annotation of an object either by placing a bounding box around it (left) or by placing multiple points among its edges (right).
Foundation model: Segment Anything Model (SAM)
Meta's Segment Anything Model (SAM) has marked a revolutionary leap in the domain of image segmentation, fundamentally transforming the landscape of computer vision. It stands as a beacon of innovation, combining efficiency and precision to solve one of the most enduring challenges in computer vision.
Traditional image segmentation techniques often require extensive human annotation and meticulously crafted training datasets, rendering them computationally expensive and impractical for handling diverse visual content. In stark contrast, SAM employs a pioneering approach. By harnessing the power of self-supervised learning, an emerging field in artificial intelligence, and tapping into vast amounts of unlabeled data from the internet, SAM autonomously identifies patterns, objects, and regions within images, all without the need for labor-intensive manual annotations. This ingenious methodology positions SAM as an extraordinarily adaptable tool, capable of segmenting virtually anything within an image, from common objects like cars and trees to more abstract elements such as shadows and reflections.
One of SAM's most remarkable feats is its prowess in handling complex scenes, where conventional segmentation models often falter. SAM thrives in scenarios characterized by multiple objects, occlusions, and varying lighting conditions. Whether it's parsing objects in bustling urban streets or delineating intricate architectural details, SAM consistently delivers impressive results, setting it apart as a standout solution.
While initially conceived to enhance the metaverse experience within Meta's ecosystem, SAM has transcended these boundaries. Its unparalleled accuracy and efficiency have found applications across a multitude of domains, including medical imaging for precise anomaly detection, autonomous vehicles for enhanced environment perception, and content moderation for swift identification of inappropriate or harmful content. SAM's versatility underscores its significance as a pioneering technology, redefining how we interact with and comprehend visual data.
Foundation model: DinoV2
DinoV2, an evolution of the original DINO (Data-Efficient Image Transformer), marks a groundbreaking advancement in the field of image segmentation. This model is set to reshape the landscape of computer vision by combining efficiency with unparalleled accuracy, making it a pivotal tool in visual data analysis.
DinoV2's inception builds upon the strong foundation laid by DINO, which introduced self-supervised learning techniques to image classification. However, DinoV2 takes this concept a step further by extending its capabilities to image segmentation, a task traditionally marred by the labor-intensive process of manual annotation.
At its core, DinoV2 excels in data efficiency. Leveraging the power of self-supervised learning, it eliminates the need for extensive human labeling efforts by extracting meaningful information from vast, unlabeled datasets. This approach not only reduces the resource-intensive nature of creating annotated datasets but also significantly enhances the model's adaptability to various visual content.
DinoV2 doesn't compromise accuracy for efficiency. Its robust segmentation capabilities shine in complex scenes replete with multiple objects and occlusions. This precision makes it indispensable in applications like autonomous vehicles, where recognizing objects with high accuracy is paramount, and in medical imaging, where the detection of subtle anomalies is critical.
The applications of DinoV2 extend far beyond image classification and segmentation. Its proficiency in content moderation ensures the swift and precise identification of inappropriate or harmful content across online platforms. Additionally, in the realm of robotics, DinoV2 facilitates scene understanding, enabling robots to interact effectively with their environments.
For deeper insights into DinoV2, we invite you to explore our comprehensive article available at this link.
Image segmentation with an image annotation tool
Embedded Image Segmentation
Integrating SAM (Segment Anything Model) into data labeling tools, particularly in the context of platforms like ours, is a strategic move that holds the potential to revolutionize image segmentation. In Kili's environment, SAM transcends its role as a standalone model, becoming an integral part of the data labeling workflow. This integration streamlines the process of annotating images, thanks to SAM's ability to provide initial segmentation predictions. Data annotators can leverage these predictions as a starting point, significantly reducing the manual effort and time required for annotation tasks.
Moreover, the SAM and data labeling tools symbiosis ensures consistency in image segmentation across datasets. This is especially critical when dealing with extensive labeling projects that demand uniformity in annotations. Additionally, the human touch remains essential, as annotators can review and refine SAM's predictions, ultimately enhancing the quality of the segmented data.
The integration's scalability is another noteworthy aspect. SAM's presence within Kili's platform allows for efficient handling of large and complex datasets, making it a powerful tool for projects demanding substantial volumes of annotated data. Lastly, SAM's competence in handling intricate scenes with multiple objects and occlusions aligns perfectly with Kili's focus on real-world, diverse image data, making it a valuable asset in domains like autonomous vehicles, medical imaging, and content moderation.
For further insights into SAM-Kili integration check out our article here.
Fine-tuning Image Segmentation models through labeled images
The incorporation of an image annotation tool, exemplified by platforms like Kili, assumes a pivotal role in the fine-tuning process of image segmentation models such as DinoV2. Our tool offers a streamlined mechanism for annotating a judiciously selected subset of images, thereby effecting substantial improvements in model performance tailored to specific use cases. By infusing human-generated annotations into the model's learning process, a heightened degree of adaptability, precision, and cost-efficiency is achieved. This approach effectively mitigates the model's reliance on voluminous labeled datasets, rendering it amenable to customization for specialized tasks or nascent scenarios.
Moreover, it establishes a framework conducive to iterative model refinement, facilitating ongoing adjustments to align the model with evolving requirements and to optimize segmentation accuracy. In essence, the synergy between human annotators and advanced machine learning models expedites the development of customized computer vision solutions, thereby accommodating specific demands while concurrently mitigating the resource-intensive nature of extensive data labeling efforts.
For a comprehensive illustration of the integration process between DinoV2 and Kili's platform, please refer to our in-depth tutorial available at this link.
In summary, this article has explored how cutting-edge image segmentation tools, such as SAM and DinoV2, integrated seamlessly into platforms like Kili, are revolutionizing AI dataset creation. These advancements are breaking down traditional barriers by offering companies a more efficient, cost-effective, and adaptable approach to developing high-quality image datasets. By expediting the labeling process, ensuring precision through a combination of human expertise and AI, and allowing for tailored dataset customization, these tools empower companies to develop more accurate and versatile AI applications, spanning industries from medical imaging to autonomous vehicles and beyond.