World Leading OCR Annotation Tool
Label documents and PDFs easily with Kili Technology. Import your assets, label using bounding boxes, segmentation, polygons, text transcription. Loop in foundation models or your own to generate pre-annotations. Explore your dataset to find & fix errors. Leave comments to your team. Integrate Kili Technology with your existing stack by leveraging our API & Python SDK.
Enterprise (>1000 employees)
Mid-market (51-1000 employees)
10x Faster OCR Annotation on All Use Cases
Annotate documents and PDFs efficiently with Kili Technology. Select the entity class with shortcuts, draw the bounding box, and simply check the automatic OCR transcription. Add relations between items. Segment and classify images along with your image transcription. Use foundation models or your own model to generate pre-annotations.
Image & PDF native support (.pdf, .jpeg, .png, .tiff)
In-app image edition
10x Less Errors in Annotation on OCR datasets
Deliver datasets of the highest quality with Kili Technology. Streamline your quality review & fix issues in-app with our Explore view. Filter your assets to identify what to improve. Use advanced quality metrics to quantify your training data's quality. Automate programmatic QA with plugins and workflows for a seamless labeling process.
10x easier OCR Labeling Ops
Integrate natively with your document processing stack from Amazon, Google, Microsoft cloud storage to Abbyy or Tesseract OCR. Ease access rights management with predefined roles. Give your users an autonomous experience through our SSO integration and keep your IT and security teams happy. Leverage Kili Technology’s OCR annotation tool online or on-premise to facilitate the collaboration between business experts, the external workforce, and data scientists.
Single Sign On (SSO)
API & Python SDK
They Trust Us
The right OCR tooling
Full coverage of annotation tasks: image tagging, text transcription, classification, relations
Support of all image files, native PDFs and image PDFs (.pdf, .jpeg, .png, .tiff, etc)
AutoML & Foundation Models pre-labeling
Easy connection to standard OCR (Tesseract, Google Vision, Omnipage)
Fastest OCR tool with automation, nested ontologies, etc
Quality focus with collaboration interfaces, user permission, human and programmatic error detection worklows
Smooth labeling ops with SSO, cloud storage, API and Python SDK access
The right expertise
On-demand labeling workforce
ML & data labeling experts
Highest of security standards (SOC2, ISO27001, HIPAA, GDPR)
Highest levels of customer care with 24/7 support
An Optical Character Recognition (OCR) annotation tool like Kili Technology helps you accelerate the labeling process of documents that include text, such as PDFs or images. It uses natural language processing and image annotation.
How do you annotate a PDF?
You can annotate a PDF in different ways with annotation tools: place bounding boxes around images and write the text transcription (OCR annotation), place masks around images (object detection), highlighting text passages (named entity recognition), classify pdf at document level or page level. This allows you to do data annotation on machine readable text and image files easily.
Why do you need to annotate PDFs?
Data annotation is an essential part of building ML models. If you're training your model to analyse documents (invoices, contracts, lease agreements, etc), you need to create a dataset of training data that will help you train your algorithm. To do that, you need to be able to do tasks needed in the annotation process: document processing, document classification, image annotation, entity recognition, text transcription, text classification and text annotation. With Kili Technology, you can do all that using the right tools: bounding boxes, active learning, polygons, and much more.
Today you can expect more than 90% accuracy for OCR based data extraction using deep learning and other ML technologies - even for hand written documents. The trick is not to rely on OCR technology alone for reliability. What works best is the combination of machine learning technologies and multiple OCR engines. Different OCR engines have different strengths - some work really great on scanned documents others are good at images captured from mobile. But once you deploy data science and machine learning technologies on top of the extracted data, you land up with something that is way more potent than vanilla OCR. This is particularly useful for handwriting where the quality of writing varies a lot. Deep learning can also be used to identify relevant sections for extraction and well as classifying documents before extraction. This helps you understand what to expect from extraction.
What is the best Neural Network Architecture to make an OCR?
A convolutional neural network would be the right type. The specific architecture would be hard to specify as it depends heavily on the dataset, but if it is a low resolution image we would recommend fractional max pooling+Bottlenecked DenseNets, if it is a high resolution image we would recommend Inception V4 or Inception ResNet-v2. Anyway we encourage you to experiment and see what works best.
Abbyy FineReader, OmniPage Ultimate and Adobe Acrobat Pro are the best annotation tools when it comes to finding an OCR annotation tool. When it comes to open source: Tesseract.
Our customers have many different use cases for OCR annotation, and some of them include detecting serial numbers and information from invoices or blurring confidential information from images with text.