What is OCR?
OCR stands for Optical Character Recognition which is the process of identifying text in an image. JustOCR processes PDF documents by looking for text characters and then adding them to the PDF's searchable text layer.
What is OCR Analysis?
OCR Analysis is a process we have developed to assess the quality of a PDF's searchable text. Making an image searchable through OCR is not a binary proposition; there are varying degrees of success. We have patented a process for assessing and scoring the quality of PDF text content.
Why is OCR Analysis important?
Key word searches and predictive coding techniques are the standard method by which parties to a litigation identify documents they hold that have a high likelihood of being relevant. Key word searches and predictive coding both rely on documents being searchable in order to work effectively.
Previously, there was no way to know what documents were searchable and which were not. The way that this was handled was to OCR process all image-based files and then run searches across them, assuming the OCR process had been successful.
But what about PDFs that aren’t that searchable even after OCR? This has not generally been contemplated by parties preparing for litigation – either PDFs are ‘searchable’ (have been OCR processed) or they are not (have not been OCR processed); it has been viewed as a binary outcome.
What we realised is that there are many levels of text quality between these two endpoints. A document can be ‘searchable’ in that is has been OCR processed, but still not be amenable to searching on the basis of poor OCR quality.
Where is my data stored?
When creating your account, you will be prompted to select an AWS region to store your data in.