Loading
Loading
  • Solutions
  • Company
  • Resources
  • Docs

World Leading OCR Annotation Tool

Label documents and PDFs easily with Kili Technology. Import your assets, label using bounding boxes, segmentation, polygons, text transcription. Loop in foundation models or your own to generate pre-annotations. Explore your dataset to find & fix errors. Leave comments to your team. Integrate Kili Technology with your existing stack by leveraging our API & Python SDK.

Suriya E.

Enterprise (>1000 employees)

Validated Reviewer Verified User Source: Organic

EVi T.

Mid-market (51-1000 employees)

Validated Reviewer Verified User Source: Organic

10x Faster OCR Annotation on All Use Cases

Annotate documents and PDFs efficiently with Kili Technology. Select the entity class with shortcuts, draw the bounding box, and simply check the automatic OCR transcription. Add relations between items. Segment and classify images along with your image transcription. Use foundation models or your own model to generate pre-annotations.

Image & PDF native support (.pdf, .jpeg, .png, .tiff)

In-app image edition

Model-based pre-annotation

Relations

10x Less Errors in Annotation on OCR datasets

Deliver datasets of the highest quality with Kili Technology. Streamline your quality review & fix issues in-app with our Explore view. Filter your assets to identify what to improve. Use advanced quality metrics to quantify your training data's quality. Automate programmatic QA with plugins and workflows for a seamless labeling process.

Dataset deep dive

User permissions

Advanced filters

Advanced quality metrics

10x easier OCR Labeling Ops

Integrate natively with your document processing stack from Amazon, Google, Microsoft cloud storage to Abbyy or Tesseract OCR. Ease access rights management with predefined roles. Give your users an autonomous experience through our SSO integration and keep your IT and security teams happy. Leverage Kili Technology’s OCR annotation tool online or on-premise to facilitate the collaboration between business experts, the external workforce, and data scientists.

Single Sign On (SSO)

Remote storage

API & Python SDK

automate

They Trust Us

test

The right OCR tooling

check mark

Full coverage of annotation tasks: image tagging, text transcription, classification, relations

check mark

Support of all image files, native PDFs and image PDFs (.pdf, .jpeg, .png, .tiff, etc)

check mark

AutoML & Foundation Models pre-labeling

check mark

Easy connection to standard OCR (Tesseract, Google Vision, Omnipage)

check mark

Fastest OCR tool with automation, nested ontologies, etc

check mark

Quality focus with collaboration interfaces, user permission, human and programmatic error detection worklows

check mark

Smooth labeling ops with SSO, cloud storage, API and Python SDK access

test

The right expertise

check mark

On-demand labeling workforce

check mark

ML & data labeling experts

check mark

Highest of security standards (SOC2, ISO27001, HIPAA, GDPR)

check mark

Highest levels of customer care with 24/7 support

Frequent Questions

What is an OCR annotation tool?

An Optical Character Recognition (OCR) annotation tool like Kili Technology helps you accelerate the labeling process of documents that include text, such as PDFs or images. It uses natural language processing and image annotation.

How do you annotate a PDF?

You can annotate a PDF in different ways with annotation tools: place bounding boxes around images and write the text transcription (OCR annotation), place masks around images (object detection), highlighting text passages (named entity recognition), classify pdf at document level or page level. This allows you to do data annotation on machine readable text and image files easily.

Why do you need to annotate PDFs?

Data annotation is an essential part of building ML models. If you're training your model to analyse documents (invoices, contracts, lease agreements, etc), you need to create a dataset of training data that will help you train your algorithm. To do that, you need to be able to do tasks needed in the annotation process: document processing, document classification, image annotation, entity recognition, text transcription, text classification and text annotation. With Kili Technology, you can do all that using the right tools: bounding boxes, active learning, polygons, and much more.

Can OCR efficiently extract information from handwritten documents?

Today you can expect more than 90% accuracy for OCR based data extraction using deep learning and other ML technologies - even for hand written documents. The trick is not to rely on OCR technology alone for reliability. What works best is the combination of machine learning technologies and multiple OCR engines. Different OCR engines have different strengths - some work really great on scanned documents others are good at images captured from mobile. But once you deploy data science and machine learning technologies on top of the extracted data, you land up with something that is way more potent than vanilla OCR. This is particularly useful for handwriting where the quality of writing varies a lot. Deep learning can also be used to identify relevant sections for extraction and well as classifying documents before extraction. This helps you understand what to expect from extraction.

What is the best Neural Network Architecture to make an OCR?

A convolutional neural network would be the right type. The specific architecture would be hard to specify as it depends heavily on the dataset, but if it is a low resolution image we would recommend fractional max pooling+Bottlenecked DenseNets, if it is a high resolution image we would recommend Inception V4 or Inception ResNet-v2. Anyway we encourage you to experiment and see what works best.

What is the best OCR software in 2023?

Abbyy FineReader, OmniPage Ultimate and Adobe Acrobat Pro are the best annotation tools when it comes to finding an OCR annotation tool. When it comes to open source: Tesseract.

What is an OCR annotation example?

Our customers have many different use cases for OCR annotation, and some of them include detecting serial numbers and information from invoices or blurring confidential information from images with text.

Get started

Get started

Get started! Build better data, now.