If you're reading this, odds are you’re already using a data labeling tool. No doubt then you’ve already noticed a direct correlation between your labeling platform and the quality of your datasets, maybe even to the point when you considered changing platforms mid-project. In this article, you will find out how Kili Technology’s data-centric approach helps differentiate us from the competitors and solve our customers’ issues.
Factors in switching labeling solutions
Throughout our journey with more than 500 customers, we have seen teams change labeling solutions mid-project to come to us. This decision was usually driven by dissatisfaction with either the tooling, the integration, performance results, or the relationship with the provider.
"Overall, I'm glad we switched to the younger company that's catalyzing changes in the compliance space. They have really nailed down the process and are super thorough and reliable."
An industry company came to Kili Technology recently, with 500,000 images labeled in the past year, and a need to annotate another 1M images in the coming year. This company focuses on developing intelligence to mitigate infrastructure risks. Their customers include major electric, gas, and renewable energy corporations. Their goal is to leverage computer vision models to detect signs of infrastructure instability and degradation.
The tool they were using was slowing them down, which was affecting both their labeling productivity and model performance. As a leader in their industry, they needed state-of-the-art AI apps to keep their lead in a very competitive market.
Unable to get quality results with focus on productivity
Their first reason for dissatisfaction with their current tool was quality management.
It’s a known fact that a 10% decrease in data quality leads to a 4% decrease in model performance, and to datasets doubling in size. Since 3.4% of labels in popular ML test sets have been proven to be erroneous, the impact of data quality is really huge. It is not enough then to provide a tool that allows ML teams to annotate as many assets as possible in a given timeline if the results are of poor quality. When labeling platforms do not give labelers the ability to deep-dive into the dataset to identify clusters of low-quality labels, or to filter their data efficiently to focus labelers on specific tasks, customers end up with lots of labeled assets but poor model performance. When faced with the task of improving the annotations, our industry company was forced to review the entire dataset, which cost them a massive amount of time and money.
“Other tools are focusing more on productivity, while Kili Technology is designed for quality”
With a tool focusing on productivity instead of quality, you risk having to rework huge amounts of data, which ends up being more time-consuming than initially expected.
Poor user experience
Productivity-based tools usually promote a smooth UX that makes labeling comfortable and streamlined. Judging by our experience, though, Machine Learning Engineers’ (MLEs) main focus is high quality labeling with no compromise on efficiency. User interfaces based on productivity are optimized for annotating large numbers of assets, but lack the ease of use when it comes to precision. For example, reviewing an asset, sending it back to the labeler for correction, or easily confirming model-generated pre-annotations are poorly handled. Customers complain that these actions tend to be click-heavy, which makes both labelers’ and reviewers’ jobs tedious and adds to the overall complexity.
Incomplete or undocumented API
ML teams spend 90% of their time working with APIs. Therefore, a good API plus the option to integrate active learning in the workflow are key factors in any labeling effort. Is essential then that the API is complete, well structured, and well-documented. Many of our competitors do provide access to their APIs, but—because of access issues—their customers struggle when trying to make these APIs match their real-life needs.
“The API of the competition has multiple blockers. It is not possible to prioritize data via the API to implement active learning, there is a lack of clear recipes, and the documentation is unclear and unfinished.”
“We had a hard time creating custom workflows using the API. Assets prioritization was not available using the API, and we experienced latency when using it.”
“The competition doesn’t allow us to run pre-annotations on all task typologies, and the SDK is slow and too restrictive”
By providing an API that is accessible, broad, and well-documented, we help our customers perform their labeling tasks in a few lines of code, rather than dozens.
Lack of reactive customer support
Having dedicated a substantial chunk of their budget to tooling, customers naturally expect strong support from their tool provider. They want to be able to discuss the roadmap and successfully manage the project, and when they face issues with their complex datasets, they want to be able to troubleshoot them quickly. When the provider is not available or active enough in customer support, customers find themselves with no guidance and no vision of progress.
Our industry customer did not get replies from the competitor’s support for months, which broke any trust relationship that existed between the parties. Lack of proper support turned out to be the tipping point that sent them looking for a different data labeling vendor.
Choosing a labeling solution
To avoid situations when you have to switch vendors mid-project, there are many things MLEs should focus on to make the right decision:
What is my use case? What type of data will I be labeling?
How many people will be involved in the labeling, what are their skills, and what are my quality constraints? Does the solution help me set up an adequate workflow?
Will my data labeling solution enable me to review work done by labelers and continuously review my data as the dataset evolves?
Can I focus on specific assets during my review? Can I track changes over time? Can I interact with labelers in-app?
What data types are supported for labeling?
What is the learning curve from novice to master? How does the software vendor help me unlock value faster?
Where is the solution hosted? Given my company's data protection policies, what are my options?
And many more! To make sure that you consider all the vital aspects when choosing the labeling platform, we’ve prepared an actionable checklist for you to use.
To keep in mind
When this customer met with our engineers, they clearly stated their expectations: one million images to annotate in the coming year, high levels of performance expected, a smooth UI to label, a strong focus on training data quality, and the ability to dive in the data as needed.
At Kili Technology, our purpose is to provide a labeling platform to produce high-quality training data. Our core quality features focus on data filtering, data review, quality analytics and comparison metrics, programmatic QA, plugins to control your labeling, and more. Productivity is built into all these features.
That is why we see many customers shift from productivity to quality: it is the only way to create training datasets that will enable you to deploy functional and trustworthy models in production.