What is data annotation for image, text and voice ?
We all experience that artificial intelligence will transform our society; however, we do not know how or to what extent.
Two things are certain:
- The subject, long confined to laboratories, is moving on to industrial applications
- And it’s going to transform a lot of things
Ray Kurzweil believes that by 2035, the human brain will be equalled and overtaken by computing power.
Artificial intelligence means automatically processing unstructured data
As scientists do not yet really know how the human brain works, I am not going to get into a broad definition of what artificial intelligence is or is not.
To put it simply, it means being able to process unstructured data. That is 80 to 90% of the available data: image, email, chat, phone, scan, news,…
Until recently, we didn’t know how to process it. Today, this is possible thanks to the technological revolution in deep learning.
What we call AI at Kili is the ability to process this data with the help of the machine.
AI is already shaping the future of many companies and is deeply transforming all sectors
Let us look at the automobile, the oil and piston industry, just 10 years ago. It is becoming one of the most data-generating industries. Thank you Tesla & Google… General Motors, Renault still sell cars. But they will offer a mobility service… or lose control of their value chain by leaving the place to Uber.
Let us look at health, in France alone, the number of deaths linked to medical errors is estimated at several tens of thousands per year (even if doctors cannot agree on the figures). Tomorrow, radiologists, oncologists, will all be assisted in their diagnosis, in their prescriptions, to focus on what makes their value: discernment. Artificial intelligence is already beginning to save lives because it is more suitable than man for these complex and very vertical tasks.
And on a more cross-functional subject, it transforms operating methods and is a source of competitiveness. It reduces the difficulty of simple tasks with low added value in back-offices, for example, and it makes the most of the complementarity between man and machine: the machine excels on repetitive tasks with limited scope, where man gets tired quickly, and it revalues expertise and human discernment, where the machine remains stupid.
Labelled data is the heart of the matter
To make AI you need 3 components: algorithms, computing power, data.
- We all have the computing power of the Apollo program on our phones. Not to mention the one that can be rented in a few clicks from Google or AWS.
- The state of the art algorithmic is accessible in open source on Github.
- For the data, it is not the volume that is missing: by 2025, we should reach 175 Zettabytes (enough to go to the moon… 23 times, if we stored it on blue ray)… The challenge is to make it assimilable by models. It’s the annotation. And we’ve all already done it thanks to Facebook, when we tagged his friends in the pictures….
In short, the key to industrializing AI is data and labelled data.
Kili, create better training data twice as fast
Today, 80% of projects fail or remain in the state of POCs, largely due to the lack of image, text and audio learning data, in sufficient quality and quantity. Few companies have already integrated that it is necessary to create datasets to deal with topics. And not choose the subjects for which the data is available. And when this is understood, the annotated data remains one of the main bottlenecks to the deployment of machine learning. Since the annotation task is mainly manual, it can be very expensive and time-consuming.
Kili allows you to
- create simple and intuitive custom interfaces to allow the business to perform the task of annotating images, text, and audio
- accelerate annotation by implementing online learning to pre-annotate, active learning to focus on the most impactful elements, and weakly supervised learning to massively accelerate the task
- to control the quality of the data produced
- easily integrate into a data science pipe line
- and facilitate human supervision in production