Top Mistakes to Avoid when Fine-tuning Computer Vision Model
Want to avoid the pitfalls of fine-tuning and improve your computer vision model's performances? Read on.
:format(webp))
Introduction
Deep learning model fine-tuning is a key operation when dealing with transfer learning.
What is Transfer Learning?
Transfer Learning is the process of taking one model learned from a machine learning task, commonly on a large and general dataset, and retraining it on a different task or on a different dataset. What is the interest in transfer learning? This is a good methodology to handle small datasets and also to obtain an accurate machine learning model at a fraction of the cost of training from scratch. Transfer learning leverages the effort already done to train models and also the learning of the representations that are relevant to the target task.
For example: if the ML task is to recognize different species of fish, you can perform transfer learning on a model that has been trained on a generic image recognition task like Imagenet.
What is model fine-tuning?
What is model fine-tuning? It is the most common technique for performing transfer learning.
In this section, we will name the source model, the original model that has been trained on a large dataset (the source dataset) to solve the source Machine Learning task, and the target model, the model trained on the target task through transfer learning. There are several ways to fine-tune a computer vision model:
replace or reset the last layer of the deep neural network: in this setting, the last layer can be replaced by another layer, for example, when the number of target classes is different from the ones of the original machine learning task. The other layers are kept intact or frozen. Once the last layer is replaced, its weights are initialized randomly. The model training consists in optimizing only the weights of this network on the target dataset.
only fine-tune the last layer of the deep neural network: here, the last layer of the source model is kept but serves as an initialization of the target Machine Learning task. This type of fine-tuning is useful when the target Machine Learning tasks deal with the same classes as in the source task. In this case, the learning rate will be set lower not to forget the decision function that has been learned in the original problem.
fine-tune all the weights of the deep neural network: finally, another way to perform fine-tuning is to use the whole source model as an initialization and to optimize all the model weights on the target dataset. As mentioned above, the learning rate should be reduced to avoid forgetting what has been learned in the original task. The last layer can also be replaced in this setting. This method can be used when the one above does not provide accurate results, especially when the target task is too different from the original one.
There are many variants of these methods, like replacing the last layers with several other layers and also gradual unfreezing, where the learning rate depends on the layer (the higher the layer, the higher the learning rate).
The following sections will describe the most common mistakes you should avoid when doing model fine-tuning in computer vision.
Using a too-small dataset
Importance of data set size
When you are doing model fine-tuning, even if you need way less data than when you are training from scratch, you still need to cover the diversity of the data that you may encounter at the model inference stage.
Transfer learning is always the first method to try when you are dealing with a small dataset because of its simplicity, but it does not perform as well as specific few-shot learning methods. Here is a warning before jumping into few-shot learning methods: these methods can have can have decent accuracy with a very small number of samples, but they also come with a number of limitations. For example, the ones that use meta-learning can easily adapt to small datasets, but provided they have been trained to adapt to similar datasets…
Fine-tuning helps but still…
Recent research has shown that transfer learning is not as general as previously thought. The performance of transfer learning decreases when the source and target tasks are dissimilar. For example, this poses a problem in the medical imaging field, where transfer learning is often performed using a network trained on Imagenet, which is designed to classify objects in color images, such as cats and dogs. More specifically, for chest X-ray images analyzed for a specific condition, training on datasets composed of ankle X-rays or Imagenet will have the same weak results, while transferring other chest X-ray images from another condition will provide strong results. So similarity between the source and the target is crucial.
Getting more data
When dealing with computer vision, it is very important to perform image augmentation and transformation. This won’t solve the issue of a target dataset that is not diverse or a source dataset that is too dissimilar from the target dataset, but this is a mandatory step. For more, you can check out this article published on Towards Data Science on how to handle the small dataset issue.
A step that can guarantee that you have enough quality data is to leverage a data annotation platform like Kili Technology, where you can ingest the data you expect to process at inference time and then annotate it or get it annotated by the experts of your choice or on-demand workforce.
Not adapting too much to the target dataset
The most critical parameter to act on when doing fine-tuning is the learning rate to adapt to your dataset size. Decrease the learning rate to stay closer to the original model and not adapt too much to the dataset.
Pick the fine-tuning method not adapted to your dataset If you have a rather small dataset; you should favor tuning only the last layer. If your dataset is large, you can afford to train the full network.
Screw-up the evaluation
As for any Machine Learning task, a separate validation dataset and an appropriate evaluation metric are necessary to measure the model's performance accurately. Having a validation set in fine-tuning is even more critical than in the other Machine Learning approaches: we do not want to tune the model too much to the training data, especially in the case of the small dataset. A validation dataset without leakage from the training dataset can help ensure that.
Not picking the validation dataset carefully
To keep it short, picking a wrong validation dataset is as detrimental to your model fine-tuning as training it with a too-little dataset. As you probably are aware: the validation dataset is critical in fine-tuning computer vision models. Indeed, it contributes to evaluating the model's generalization performance and avoiding overfitting or underfitting. As a result, it's critical to carefully select the validation dataset you'll use to ensure that it represents the diversity and complexity of the real-world scenarios that your computer vision model is expected to encounter.
Not picking the right metric
Picking the right metric to evaluate your model is critical for the success of your fine-tuning. Indeed, using metrics adapted to your task, especially if the original model was not trained on the same task, helps to ensure that the model is optimized for the desired objective and can help guide the training process to achieve better results.
Here are several relevant metrics that can be used when fine-tuning your computer vision model. Of course, it's not a one-size-fits-all situation. You should always pick your metric depending on the specific problem and the desired outcome.
1. Accuracy: This is the most common metric for evaluating classification models. It measures the percentage of correct predictions made by the model.
2. Precision and Recall: These are important metrics for binary classification problems. Precision measures the percentage of true positive predictions among all positive predictions, while recall measures the percentage of true positive predictions among all actual positive samples.
3. F1-score: This is a combined metric that balances precision and recall. It is the harmonic mean of precision and recall.
4. Intersection over Union (IoU): This metric is commonly used for object detection and segmentation problems. It measures the overlap between the predicted and actual bounding boxes or masks.
5. Mean Average Precision (mAP): This metric is also commonly used for object detection problems. It measures the average precision across multiple thresholds for the IoU metric.
6. Mean Intersection over Union (mIoU): This is a common metric for semantic segmentation problems. It measures the average IoU across multiple classes.
Not using data augmentation
Overall, data augmentation is an essential technique for improving the performance of computer vision models. It's especially the case in scenarios where the available data is limited, or the model is prone to overfitting.
This technique can be used to artificially increase the size of a dataset by creating modified versions of existing data. e.g., by slightly adjusting the light settings of some images or pivoting images.
Data augmentation can help prevent overfitting and improve the performance of a model.
Not trying to fine-tune all layers of the model
When fine-tuning a model, it can be worth it to try updating all the layers of the model.
Fine-tuning all layers when fine-tuning a computer vision model is likely to:
improve learning task-specific features,
may lead to higher performances – especially true if the new task is significantly different from the pre-training task and requires the model to learn new features at all levels of abstraction.
Although the optimal fine-tuning strategy will vary depending on the task and the training data, fine-tuning all layers of your computer vision model should be a go-to option.
Top mistakes to avoid when fine-tuning computer vision model: final thoughts
And... That's all, folks! You're now ready to go ahead and fine-tune the computer vision models of your own while avoiding the most common mistakes of this process.