You might not know this, but:
MNIST’s dataset has an error rate of 3.4%, and is still cited by more than 38,000 papers.
The ImageNet dataset, with its crowdsourced labels, has an error rate of 6%. This dataset arguably underpins the most popular image recognition systems developed by Google and Facebook. Systemic errors in these datasets have real-world consequences. Models trained on error-containing data are forced to learn those errors, leading to false predictions or a need for retraining on ever-increasing amounts of data to “wash out” the errors.
Every industry has begun to understand the transformative potential of AI and have started investing in it. But the revolution of ML transformers and the relentless focus on ML model optimization is reaching the point of diminishing returns. What else is there?
Data Quality is critical
Reducing or eliminating labeling errors, getting the right annotations the first time, and focusing on the final input to good machine learning models are now paying huge dividends.
However, data quality can be the most difficult part of developing a reliable model. This is because there is a need for coordination between human intelligence, modeling expertise, project management, and the technology that binds them all together.
This can oftentimes be a painful endeavor. The real differentiator between businesses that are successful at AI and those that aren’t is data quality: what data is used to train & test the algorithm, how is it gathered and labeled, and how is it governed? Our customers’ experience and our experience is that the move to Data-Centric AI (DCAI) is the most important shift businesses need to make today.
Data Quality is priceless
Human-labeled data is becoming the fuel and compass for AI-based software systems. But the increasing focus on the scale, speed, and cost of building and improving datasets has impacted the data's quality and thus the models' quality.
We have seen reasons for concern first-hand: fairness and bias issues in labeled datasets, quality issues in benchmark datasets, benchmark limitations, reproducibility issues in machine learning research, lack of documentation and data replication, and unrealistic performance measures.
Data Quality is complex
While the quality of datasets remains everyone's primary concern, the way it is measured in practice is poorly understood and sometimes just plain wrong.
Data quality is complex—it is not just software bugs or human errors. It is typically the result of how well the annotation is done, how well a dataset and annotation ontology represents the actual task, and if the quality metrics that are available, are suitable for the job.
Data annotation is complex because there are multiple interpretations of the truth, because some gestures are hard, and because collaboration induces complex communication and synchronization.
The development of tools to make repeatable and systematic adjustments to datasets has lagged.
At Kili Technology, we want to reverse this and find new and systematic ways to promote seamless interactions between humans and data.
Models have to be developed iteratively
When developing a model, labeling and model testing should work at the same time to remove the unnecessary trial-and-error time spent on improving the model without having to worry or change inconsistent data.
So, if we want to be cost effective, the model development infrastructure must be tightly integrated with a supervision layer so that labeling, model training, and model diagnostics can work in parallel and directly influence the data used for the AI system.
The future of AI is getting the best out of humans and machines by creating a human-in-the-loop machine learning process, thus dramatically accelerating the set up of reliable AI applications.
At Kili Technology, we firmly believe that focusing on high-quality training data, that is consistently labeled, is the way to unlock the value of AI.
Our platform’s purpose is to enable businesses to label high-quality datasets to train trustworthy AI.
Kili Technology began as an idea in 2018. Edouard d’Archimbaud, our co-founder and CTO, was working at BNP Paribas, where he built one of the most advanced AI Labs in Europe from scratch. François-Xavier Leduc, our co-founder and CEO, knew how to take a powerful insight and build a company around it. While all the AI hype was on the models, they focused on helping people understand what was truly important: the data.
Together, they founded Kili Technology to ensure data was no longer a barrier to good AI. By July 2020, the Kili Technology platform was live and by the end of the year, the first customers had renewed their contract, and the pipeline was full. In 2021, Kili Technology raised over $30M from Serena, Headline and Balderton.
Today Kili Technology continues its journey to enable businesses around the world to build trustworthy AI with high-quality data.
Without them, are we even human?
keeps us climbing the highest peaks
keeps us trying and discovering things anew
keeps us focused on what matters most: people
keeps us from going in circles
keeps us in the winning team
Trusted by the world’s best companies
Our clients and investors trust Kili to take the AI industry to new and exciting places.