Data Labeling

Computer Vision

Make Training Process Productive With Kili AutoML and Weights & Biases

A guide on how to ake the training process more productive by empowering your pipeline with Kili AutoML and Weights and Biases.

Kili Technology

Jun 28, 2023

Heading2

Heading3

AI Summary

Introduction

With the increase in the complexity of AI-based projects and the huge increase in the sizes of the datasets used, it became painful and sometimes unfeasible to use the traditional pipeline of labeling and training. Also, the increase in the number of hyper-parameters of the models makes it pretty hard to manage the training experiments and to compare the performance among various trained models. In this article, we will explain a very powerful couple of tools to produce a comprehensive AI training pipeline. The first one is Kili AutoML, which is provided by Kili, and the other one is Weights and Biases. To be more specific, the article will take into consideration the surface defect detection task using the scratches class of the NEU-DET dataset.

What is Data-Centric AI?

First of all, let’s start by understanding the concept of data-centric AI, which is an artificial intelligence approach that focuses on the data to make sure that it represents what the desired AI system should learn. Also, the concept of data-centric AI prioritizes the quality of the data over its quantity. In other words, instead of enhancing the structure of the model that will be trained, this approach suggests enhancing the quality of the data to improve the performance of AI-based systems.

What is Kili AutoML Project?

AutoML is a tool provided by Kili that gives the ability to train models and to use them to label the rest of the data automatically. In this way, we save labeling time and make the labeling process faster and more consistent. For now, AutoML supports natural language processing tasks such as Named Entity Recognition and Text Classification and Computer Vision tasks like object detection and image classification.

What is Weights and Biases? How Does It Work with Kili AutoML?

W&B is a product that provides a dashboard to monitor the whole training process. It embeds various visualizations to compare the trained models and trace the performance of the models. Using W&B, we can understand the effect of the parameters used for training among all training experiments.

Use case: Surface Defect Detection

Task Description

Surface Defect Detection is the task of detecting defects on surface pictures and classifying the detected defect into specific classes.

Dataset Explanation

NEU-DET is a dataset provided by Northeastern University. It contains 6 classes of defects with an average of 240 images for training and 60 images for validation per class.

Classes of NEU-DET

Traditional Pipeline

To achieve this task, we can use the traditional pipeline which suggests we start by constructing the dataset, after that, completely manually labeling it one by one, and finally, training the desired model.

If we investigate the following figure, we notice that it is suitable only for simple tasks, but for complicated tasks, it is not feasible to annotate the data completely in a manual way.

Traditional training data pipeline

Kili Automl Empowered Pipeline

To solve the problem explained in the previous paragraph, Kili provides AutoML to automate some parts of the process of labeling which leads to speeding up the whole process.

By using AutoML we start the training process by collecting the unlabeled data; then, we import the dataset into Kili; then we label only a part of the dataset in the labeling interface. After that, using the `train` command, we obtain the first model. If we get partially satisfied by the model, we call the `predict` command to automatically label another part of the dataset so the labelers do not need to label that part but only need to verify it.

Time Comparison

Labeling a photo that contains 3 different objects needs the following steps:

Inspecting the image ~ 4 seconds.
Choosing the class ~ 1 second.
Drawing the bounding box ~ 2 seconds.

Therefore, it needs a minimum of 13 seconds. Conversely, verifying the labels of an image that contains 3 objects annotated with a partially good model needs the following steps:

Inspecting the image ~ 4 seconds.
Fixing the label if misclassified: [0,1] seconds.
Fixing the bounding box: [0,2] seconds.

As a result, we notice that by importing the auto-generated labels, the time of drawing and choosing the class will be reduced, and by the improvement of the model over time it will be negligible. Therefore, using AutoML can potentially reduce the time of labeling by a percentage of about 80%.

To conclude, the following figure describes the AutoML-empowered pipeline.

AutoML empowered training pipeline

Even though the improvement suggested by Kili is speeding up the process of training, the previous pipeline still has an issue. In complex tasks of supervised learning, we need to fine-tune the hyper parameters of the models. Therefore, this leads to the need for an effective way to report and compare the performance of trained models to choose the best one.

Kili AutoML and W&B Empowered Pipeline

Weights and Biases provides a solution that can help us report and compare models during the training process. W&B provides a dashboard that is associated with a project to visualize the changes in the performance due to various parameters among different models.

The good news here is that Kili-AutoML supports W&B, so the `train` command is integrated with W&B. Therefore, we can use the power of W&B while we are training our models using AutoML. The following figure describes the final pipeline that is empowered by AutoML and W&B

AutoML and W&B empowered training pipeline

Creating New Project On Kili

Firstly, we can start by creating a new project on Kili platform like the following:

Creating a new project on Kili platform

Later on, we upload our unlabeled dataset and then we specify the classes that we have.

Dataset uploading interface on Kili platform

Class configuration on Kili platform

Finally, we end up with a project dashboard that contains the uploaded assets with the ability to filter and sort them according to last update, labeller and status.

Project dashboard after configurations

Subsequently, we start to annotate our dataset by labeling a small part of the assets that we imported to the workspace. Kili provides a smooth interface for labeling with very practical keyboard shortcuts. Also, it allows us to inspect the images with different brightness and contrast values and to rotate the images.

Kili labeling interface

Train Command Usage

Basically, to be able to run the `train` command we need to acquire a Kili API key and the ID of the Kili project we are working on; this can be acquired from the Kili application settings, and can be passed directly to the Kili commands or set as an environment variable. After manually labeling a part of the data, we can call the script in the following way after cloning the project and installing the required packages:

kiliautoml train --api-key YOUR_API_KEY --project-id YOUR_PROJECT_ID

Note: Run kiliautoml train --help for other training configurations.

Performance Reports

For our task, the script will train a YOLOv5 detection model and by the end of execution of the command, we get reports about the performance in both the terminal and W&B dashboard:

Terminal performance log

If we check the logs on the W&B dashboard, we can see that it provides several types of evaluations, both per iteration and global. For example, the following image shows the evaluation metrics for one iteration:

W&B evaluation metrics report for one iteration

Also, we can see the visualized loss of training and validation in the loss section:

W&B loss report for one iteration

For more insightful reports, W&B embeds a section for inference samples that shows the output of the model over some samples from the validation set like the image below:

W&B dashboard’s inference samples

W&B also provides sections like training batch mosaics, confusion matrix, F1 curve, PR curve, P curve, R curve, learning rate graph, and bounding box debugger. In addition, to be able to monitor the training experiments, W&B provides detailed reports and comparisons between the intermediate models. For example, the following chart reports the evaluation metrics of all trained models; each curve is associated with the ID of the experiment:

W&B evaluation metrics report for all iterations

Similarly, W&B provides comparisons between the losses of different training experiments as a chart like the following:

W&B loss report for all iterations

Finally, W&B gives reports about the training environment and the used hardware usage. For example, the following charts report the usage of the GPU of the training machine:

W&B system report for all iterations

W&B embeds further insights about the system like GPU memory allocated, GPU time spent accessing memory, GPU temp, GPU utilization, network traffic, disk utilization, CPU threads in use, memory available and CPU utilization. For better portability, at the end of our work, we can use the “Create report” button to export all the charts as a report that can be downloaded in LaTex or PDF formats.

Predict Command Usage

If we get partially satisfied by the model, we can call the `predict` command in the following way:

kiliautoml predict --api-key YOUR_API_KEY --project-id YOUR_PROJECT_ID

The command uses the trained YOLOv5 model to do inferences over the unlabeled data. Subsequently, it generates the labels and sends them to Kili servers to store them in the project. After the execution of the command, we check out the automatically labeled data from Kili Dashboard:

Traditional training data pipeline

In the image above, we notice that some of the data is labeled by Kili-AutoML; the annotations of these images are automatically generated by the execution of the `predict` command and are ready for verification. Kili shows predictions produced by AutoML in dashed bounding boxes to distinguish them from the verified bounding boxes. The following images show the bounding boxes before and after verification.

Predicted and verified bounding boxes on Kili dashboard

How to Compare Models Using W&B Dashboard?

Since the process is iterative, W&B assigns a unique ID to each version of the obtained models. We can see the comparison of every iteration on W&B dashboard so we can figure out the effect of increasing the data, and also, we can monitor the behavior of the models trained with different parameters, so we change the parameters carefully. Finally, comparing the various obtained models, we choose the best one.

Conclusion

In this article, we started to solve the problem of surface defect detection with a traditional AI training pipeline. However, we detected two weaknesses in the pipeline that led to a loss in time and productivity; which were the need to label all the data manually and the difficulty of monitoring the training experiments. On the other hand, we noticed that empowering the pipeline with AutoML saves us a lot of time in terms of labeling. Later on, we found out that using W&B helps us to manage the process of training models in a more productive way that helps us to choose parameters in an optimized way.

An article by Asmaa Mirkhan

Subscribe for updates

Stay updated with the latest news, articles and update directly into your box

May 7, 2026

Data Story: What Kimi K2.6 Actually Tells Us About Open-Weight Coding Models

Moonshot AI's Kimi K2.6 is the first open-weight model to credibly out-score GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro — and it shares a published architecture with K2.5. Everything that changed sits in the training recipe.

Kili Technology

Foundation Models

LLMs

March 23, 2026

Data Story: A Deep Dive into Deepseek V4 (Updated May 2026)

DeepSeek V4's technical report reveals a training data strategy built for million-token contexts: 32T+ tokens, specialist domain experts trained independently, and a distillation pipeline that merges ten teacher models into one. This analysis breaks down what the data decisions mean for practitioners building their own AI systems.

Kili Technology

Foundation Models

LLMs

April 30, 2026

Custom AI Benchmark Guide: What the Best Public Evals Teach You About Building Your Own

The public benchmarks that defined AI progress for the last three years are saturating in months, not years, and a 445-benchmark systematic review found most don't accurately capture what their names claim. This guide distils the design choices behind HELM, GPQA Diamond, SWE-bench, LiveCodeBench, LegalBench, and MultiMedQA into a practitioner methodology for building AI benchmarks you can actually trust.

Kili Technology

LLMs

AI Evaluation

Foundation Models

Make Training Process Productive With Kili AutoML and Weights & Biases

Table of contents

AI Summary

Introduction

What is Data-Centric AI?

What is Kili AutoML Project?

What is Weights and Biases? How Does It Work with Kili AutoML?

Use case: Surface Defect Detection

Task Description

Dataset Explanation

Traditional Pipeline

Kili Automl Empowered Pipeline

Time Comparison

Kili AutoML and W&B Empowered Pipeline

Creating New Project On Kili

Train Command Usage

Performance Reports

Predict Command Usage

How to Compare Models Using W&B Dashboard?

Conclusion

Subscribe for updates

Related articles

Data Story: What Kimi K2.6 Actually Tells Us About Open-Weight Coding Models

Data Story: A Deep Dive into Deepseek V4 (Updated May 2026)

Custom AI Benchmark Guide: What the Best Public Evals Teach You About Building Your Own

Ready when you are. Start your free trial.