Label engineering: Bounding box vs Polygon

Label engineering: Bounding box vs Polygon

When setting up a project of object detection, you will have to choose your annotation tool. The most commonly used tools in machine learning and artificial intelligence projects are bounding boxes. However, other tools such as polygons also exist in the industry. But what are these differences and which tool should I choose for my project?

Differences

Bounding boxes are rectangles drawn by the annotator. As any rectangle, a bounding box is defined by two points. Therefore at Kili, the user only has to click at a given point and drag his mouse to the second point while keeping the left click pressed, to draw a bounding box. This task is easy to understand and to complete, that is why bounding boxes are the most widely used tool in the industry. In many cases a bounding box is sufficient to define the position of an object on the image. However, for objects that are not vaguely rectangle shaped, a bounding box is not precise enough and we need something else.

On the other hand, a polygon tool is much more refined but is more difficult to draw. While a bounding box is only defined by two points, a polygon can have an arbitrary number of points and therefore define much more accurately an object on an image. The counterpart is that it is slower to draw a polygon and more complex to use for an annotator.

Bounding boxPolygon
+ Simple
+ Quick

– Lack of precision
+ More precise
– Slower to draw
– More complex to use

What is the best tool for my project?

As said above, bounding boxes are sufficient in most cases, so we recommend to use them since it is far quicker to draw. Moreover it has been proved that using polygons for rectangular objects does not lead to an improvement of the model’s performance [1]. Polygons must be used for projects where objects do not fit well in a rectangular box. This can be due to their intrinsic shape or simply their orientation in the picture. Let’s see two typical use cases of polygons: geospatial data and autonomous driving. Geospatial data is data that comes from drones and satellites. A common task for annotators is to define the contour of a given zone (ie: a forest, a house, land lot, a park…) and this contour is rarely rectangular. That is why annotators must use polygons. In the case of autonomous driving, many objects have asymmetric shapes and therefore cannot be annotated with a bounding box.

Furthermore, if you need to annotate polygons, the Kili platform helps you to tackle the biggest problem which is the complexity of a polygon compared to a bounding box. Thanks to automatic tools such as interactive segmentation or superpixels, the speed of annotation of a polygon has been drastically reduced.

Conclusion

When starting an annotation project, it is really important to define which tool will be used to annotate. While many ML models have a good accuracy with bounding boxes, some require polygons to achieve good performances. In the past, many ML projects were restricted to bounding boxes because polygons were really complex and time consuming to annotate. With Kili, you can now annotate polygons very easily and therefore boost your model’s performance.

References

[1] J. F. Mullen, F. R. Tanner and P. A. Sallee, “Comparing the Effects of Annotation Type on Machine Learning Detection Performance,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 855-861, doi: 10.1109/CVPRW.2019.00114.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.