Medical training data annotation

Medical training data annotation

Kili technology makes annotating images and text simple and fast. Import DICOM 2D, 3D CT Scan or MRI, classify, draw bounding boxes, make polygons or segment to identify suspicious spots on the skin, lesions, tumours and brain haemorrhages. Build your training datasets with highly customizable interfaces that allow you to combine tasks to improve productivity.

You are in good company – small and large

What is medical training data?

In today’s world, peta bytes of medical data is digitized in various healthcare institutions, public hospitals, retirement homes, medical clinics, pathology laboratories, etc. Unfortunately, these 2D or 3D images, CT Scan or MRI data are often disordered and unstructured. Unlike standard transactional business data, patient data is not directly usable to build models with machine learning.

This data must be annotated, i.e. the content of the object of interest in the images must be labelled so that it can be recognised by machines using computer vision and NLP.

With sufficient medical training data in quality and quantity, the list of potential use cases for the industry is almost endless, from AI-assisted radiology and pathology to the identification of rare or difficult to diagnose diseases.

  • native pdf annotation

Where and how can I find enough data to build a medical application?

Many researchers around the world are looking to use computer vision models to detect skin cancer, brain tumours and other visually diagnosable diseases. However, creating and training these models requires access to large amounts of annotated medical image data.

It is not a big problem to find certain datasets. You can search for “medical datasets” in your favorite search engine.
However, in order for a model to be able to make accurate predictions, it must be trained on a large amount of high-quality data that is specialized in the problem you want to address.

Thus, if you have to deal with a real use case, you will have no choice but to collect data very specific to your use case from a clinic or hospital and label it. Labelling can be expensive and of poor quality. That’s where Kili comes in.

Why choose Kili to generate my medical training data?


Kili manages DICOM 2D, 3D MRI or CT Scan images, and offers specialized interfaces for all annotation tasks related to medical imaging and NLP: image classification for visual diagnosis, identification of lesions, tumors, cancer cells, entities extraction for medical documents, ocr for medical records and more.


Kili’s state of the art quality management system allow an intensive collaboration and a rigorous review throughout the life of the project to ensure clean, high-quality medical imaging training datasets.


At Kili, you can annotate wherever you want with whomever you want. On premise or in Saas, with your annotators or with our annotators, remotely or in your premises, we adapt to your constraints!


Annotating can be expensive. By allowing the use of online learning, active learning, weakly supervised learning or data augmentation, Kili allows you to drastically reduce the cost of annotation!


Kili has access to a unique network of medical professionals around the world able to accurately translate, transcribe, and annotate medical data, so we can quickly create large, custom medical imaging and NLP training datasets.

Some Kili’s medical training data interfaces

Diagnostic for medical imaging

Add structure to the image with Bounging Box Annotation, Semantic Annotation, Polygon Annotation, Point Annotation, Segment Annotation, Image Classification, and more. We support the DICOM image format for AI in radiology.

Entity Extraction for medical documents

Add structure and semantic information to unstructured text at the document and word level. Take advantage of our weakly supervised learning service to use business rules such as regular expressions and dictionaries to annotate massively before human intervention.

OCR for medical records

Crop parts of the text while saving the text to construct training data. Correct even the most subtle input errors, as for sensitive medical data, even small errors cannot be tolerated.

A last but not least, create your own interfaces for your specific tasks with Kili’s interface builder!

Ready to simplify labelling in your company?

Discover the solution now

Success stories


Kili has enabled an American scaled up company to improve a model of assisted mammography diagnosis. The algorithm has been further enhanced to accurately interpret mammograms, enabling reliable detection of breast cancer at the level of the best experts.


Kili is used by a French start up which is developing a new diagnostic tool that is non-invasive and more effective than urinary cytology for the detection of bladder cancer. Kili makes it possible to accelerate the classification of microscope images and to perform semantic segmentation of cancer cells.


Kili helped a speech-to-text solution developer specialize a model on medical (semantic) data to fasten transcription of radiologist reports. The project was conducted in three parts: data sourcing, data annotation and data augmentation.