Over the last years, Machine Learning engineers and data scientists have focused mainly on improving artificial intelligence models in order to achieve better results many among which image recognition, automatic speech recognition systems, video or text processing, etc.
However, it seems clear today tant improved deep learning models are bringing better results when trained on given and standard curated data sources such as imageNet, libriSpeech, Common Voice, etc. When trained on cleaned and improved datasets, they regularly beat State Of The Art on specific use cases.
This new approach aims to move towards improving labels in the data in order to improve the performance of the results of artificial intelligence algorithms.
Photograph by Marco Verch
We are convinced that this approach is more than liable because it adapts much better to the specific use case in each industry.
Moreover, this approach requires fewer technical skills which lets the Machine Learning Engineers and Data Scientists focus on their area of expertise.
It is with this objective that we are offering a full-week challenge dedicated to the treatment of an image recognition use case around the data-centric approach.
During this community challenge, we will show that the improvement of data allows a significant improvement in results without technically changing a training model.
To do this we chose an object detection task through the yolo-v5 training script. This Yolo v5 will continue training on a daily basis and remain unchanged during the week. On the other side, we will be improving the quality of the data and look at the impact on loss or training time for instance.
The goal is to demonstrate that the Data-Centric approach is one of the solutions towards less biased and more efficient algorithms, requiring less technical skills.
The topic of the dataset will be revealed soon.
Please Join us to the Challenge